Surface slag inclusions represent a pervasive and detrimental quality defect in continuous casting production. These non-metallic particles, entrained within or just beneath the slab surface, exhibit random distribution patterns. The hereditary nature of these inclusions means that even minor defects can propagate through subsequent hot rolling processes, manifesting as surface defects such as slivers or scabs on the final rolled product. This severely compromises product quality, reduces metal yield, and in extreme cases, can lead to catastrophic breakout events. Therefore, the online prediction of slag inclusion occurrence is paramount for implementing timely corrective grinding, thereby mitigating downstream quality risks and enhancing overall production economics.
Traditional methods for detecting slag inclusions, such as manual visual inspection or offline sampling analysis, are inherently inefficient, subjective, and incapable of real-time monitoring. While machine vision-based techniques offer automation, their accuracy is often compromised by interfering factors like oxide scale and mold flux residues on the hot slab surface. Consequently, machine learning (ML) algorithms, renowned for their powerful function approximation and self-learning capabilities, have emerged as a focal point for researchers aiming to develop robust prediction models. A significant challenge persists with existing ML-based models: they often achieve high accuracy on training data but fail to maintain consistent, high performance on unseen test data due to the lack of a systematic approach for hyperparameter optimization. This gap between training and generalization performance limits their practical reliability.
In this work, I address this critical issue. Leveraging a comprehensive slab sample dataset acquired from a deployed slab quality analysis and monitoring system, I developed and compared prediction models using Support Vector Machine (SVM), Random Forest (RF), and Adaptive Boosting (AdaBoost) algorithms. The core of my investigation focuses on establishing a quantifiable relationship between training accuracy metrics and final test accuracy. This relationship is then instrumentalized within a Particle Swarm Optimization (PSO) framework to systematically identify the optimal SVM hyperparameters that maximize generalization performance. The ultimate goal is to create a model capable of reliably predicting slag inclusion probability online, enabling a stratified inspection strategy to filter out defective slabs before rolling.
1. Influencing Factors and Data Foundation
The formation of surface slag inclusions is a complex phenomenon primarily attributed to the entrapment of exogenous inclusions (e.g., refining slag, refractories, mold flux) or endogenous inclusions (e.g., deoxidation products) at the meniscus region due to uneven fluid flow and heat transfer. A multitude of process parameters across steelmaking, refining, and casting influence inclusion generation and meniscus stability. Based on industrial expertise and system capability, I identified and collected data for 60 relevant features. These include parameters from the converter (e.g., main blow time, oxygen volume), secondary refining (e.g., LF stirring time, RH vacuum degree, steel/slag composition), and continuous casting (e.g., tundish sequence length, superheat, stopper rod position, SEN depth, mold level fluctuation, casting speed, mold thermal profiles, oscillation parameters). Operational events like tundish changes are also recorded.
The slab quality analysis and monitoring system performs the critical task of spatiotemporal matching, aligning high-frequency time-series process data (e.g., stopper position, casting speed) with individual slabs based on their entry and exit timestamps from the mold. For each slab and each high-frequency parameter, aggregated statistical features (like the mean value) are calculated over the slab’s casting duration. These features, combined with offline inspection results that label slabs as either having slag inclusions or being defect-free, form the structured slab sample dataset. The initial dataset comprised 194 slabs with confirmed slag inclusions and 669 defect-free slabs, indicating a significant class imbalance where defective samples are the minority.

2. Analytical Methodology
2.1 Data Preprocessing
Raw industrial data often contains missing values and outliers due to sensor faults or manual entry errors. Missing values were handled using deletion or mean imputation. For outlier detection and treatment, I employed the robust Z’-Score method, which is less sensitive to extreme values than the standard Z-score. The modified Z’-score is calculated as:
$$ Z’ = \frac{x_i – M(x)}{1.4826 \times MAD} $$
where \( M(x) \) is the median of the feature data and \( MAD \) is the median absolute deviation. Data points with \( |Z’| > 3 \) were considered outliers. Right-skewed outliers were replaced with \( M(x) + 3 \times 1.4826 \times MAD \), and left-skewed outliers were replaced with \( M(x) – 3 \times 1.4826 \times MAD \).
2.2 Modeling Algorithms and Class Imbalance
I constructed prediction models using three distinct algorithms:
- Support Vector Machine (SVM): A powerful classifier that finds an optimal hyperplane to separate classes. To handle the inherent class imbalance, I used a class-weighted SVM. The optimization problem for this imbalanced binary classification is formulated as:
$$ \min_{\mathbf{w}, b, \boldsymbol{\xi}} \frac{1}{2} \|\mathbf{w}\|^2 + C^+ \sum_{i: y_i=+1} \xi_i + C^- \sum_{i: y_i=-1} \xi_i $$
$$ \text{subject to: } y_i(\mathbf{w}^T \phi(\mathbf{x}_i) + b) \geq 1 – \xi_i, \quad \xi_i \geq 0, \quad i=1,\ldots,n $$
Here, \( \mathbf{w} \) and \( b \) define the hyperplane, \( \xi_i \) are slack variables, \( \phi(\cdot) \) is a kernel function mapping data to a higher dimension, and \( C^+ \) and \( C^- \) are separate penalty parameters for the positive (slag inclusion) and negative classes, respectively. This allows for a higher penalty for misclassifying the minority class.
- Random Forest (RF): An ensemble method aggregating predictions from multiple decision trees.
- Adaptive Boosting (AdaBoost): An ensemble method that combines weak learners, focusing successively on hard-to-classify samples.
To mitigate the class imbalance during model training, I applied the Synthetic Minority Oversampling Technique (SMOTE) to the minority class (slag inclusion samples). The optimal oversampling ratio was determined by observing the stabilization of correlation coefficients between key process variables and the target label as the minority sample size increased. For this dataset, a 4x oversampling ratio was found to be optimal.
2.3 Model Evaluation Metrics
For slag inclusion prediction, where failing to detect a defective slab (false negative) is more costly than a false alarm (false positive), standard accuracy is misleading. Therefore, I adopted the following metrics:
- False Negative Rate (FNR) or Miss Rate: The proportion of actual slag inclusion slabs incorrectly predicted as defect-free.
$$ FNR = \frac{FN}{TP + FN} $$ - False Positive Rate (FPR): The proportion of actual defect-free slabs incorrectly predicted as having slag inclusions.
$$ FPR = \frac{FP}{FP + TN} $$ - Fβ Score: The harmonic mean of precision (Pr) and recall (Rc), weighted by β. Setting β=2 places more emphasis on recall (minimizing FNR), which aligns with the operational priority.
$$ F_{\beta} = \frac{(1 + \beta^2) \cdot Pr \cdot Rc}{\beta^2 \cdot Pr + Rc}, \quad \text{where } Pr = \frac{TP}{TP+FP}, \quad Rc = \frac{TP}{TP+FN} $$
Here, TP, FN, FP, and TN represent True Positives, False Negatives, False Positives, and True Negatives, respectively. A higher F2 score indicates a better model for this specific task.
2.4 Hyperparameter Optimization via PSO
The performance of models like SVM critically depends on hyperparameters (e.g., \( C^+, C^-, \) kernel parameters). I employed Particle Swarm Optimization (PSO), a bio-inspired stochastic algorithm, for systematic search. In PSO, a swarm of particles (each representing a candidate hyperparameter set) moves through the search space. Each particle’s position \( \mathbf{X}_i \) and velocity \( \mathbf{V}_i \) in a D-dimensional space are updated iteratively based on its own best-known position (\( \mathbf{P}_{best,i} \)) and the swarm’s global best-known position (\( \mathbf{G}_{best} \)):
$$ \mathbf{V}_i^{k+1} = \omega \mathbf{V}_i^{k} + c_1 r_1 (\mathbf{P}_{best,i}^{k} – \mathbf{X}_i^{k}) + c_2 r_2 (\mathbf{G}_{best}^{k} – \mathbf{X}_i^{k}) $$
$$ \mathbf{X}_i^{k+1} = \mathbf{X}_i^{k} + \mathbf{V}_i^{k+1} $$
where \( \omega \) is the inertia weight, \( c_1 \) and \( c_2 \) are acceleration coefficients, and \( r_1, r_2 \) are random numbers in [0,1]. The key innovation in my approach was the fitness function used to evaluate each particle’s position. Instead of using the test set F2 score directly (which would be invalid), I used a surrogate model predicting the test F2 score based solely on training metrics.
3. Results and Discussion
3.1 Comparative Model Performance
I conducted 5,000 randomized training-testing experiments for each algorithm (SVM, RF, AdaBoost). For each experiment, 70% of data was used for training (oversampled) and 30% for testing, with hyperparameters randomly selected within bounds. The average performance metrics are summarized below.
| Algorithm | Training Set Metrics | Test Set Metrics | ||||
|---|---|---|---|---|---|---|
| FPR | FNR | F2 Score | FPR | FNR | F2 Score | |
| SVM | 0.142 | 0.098 | 0.871 | 0.251 | 0.224 | 0.683 |
| Random Forest | ~0.0 | ~0.0 | ~1.0 | 0.239 | 0.381 | 0.542 |
| AdaBoost | 0.121 | 0.115 | 0.862 | 0.218 | 0.267 | 0.651 |
The analysis reveals a critical trade-off. The Random Forest model exhibited near-perfect performance on the training data, indicating severe overfitting and poor generalization, as evidenced by its high test FNR (38.1%) and low F2 score (0.542). While AdaBoost showed a slightly better test FPR than SVM, its test FNR was significantly higher. The SVM model demonstrated the most favorable balance between fitting and generalization, achieving the highest test F2 score (0.683) and a competitive test FNR. Given that minimizing missed detections (FNR) is the primary industrial concern, the SVM algorithm was selected as the foundation for the final optimized prediction model.
3.2 Modeling the Relationship Between Training and Test Performance
The central challenge is optimizing hyperparameters for maximum test performance using only training data. To bridge this gap, I analyzed the 5,000 runs from the SVM experiments to establish a relationship between training metrics and the ultimate test F2 score. Let \( t_1 \), \( t_2 \), and \( t_3 \) represent the standardized (zero mean, unit variance) values of training FPR, training FNR, and training F2 score, respectively. The target variable is the test F2 score \( f \). A polynomial regression model was constructed, and after feature selection based on coefficient magnitude, the following significant relationship was derived:
$$ f(t_1, t_2, t_3) = 0.62 – 1.19 t_2 t_3 – 0.89 t_3^2 – 0.40 t_2^2 – 0.35 t_1 t_3 – 0.28 t_1 t_2 $$
This model, with a mean squared error of 0.00256 and an R² of 0.610, effectively captures the nonlinear interdependence between training performance indicators and the expected test performance. It indicates that a very high training F2 score (\( t_3 \)) alone is not predictive of good test performance; its square and interaction with FNR have strong negative coefficients. This surrogate model \( f(t) \) became the fitness function for the PSO algorithm.
3.3 PSO-SVM Optimization and Final Model
I implemented the PSO algorithm to optimize the SVM hyperparameters (\( C^+, C^-, \) and kernel parameter \( \gamma \)). Each particle’s position encoded a potential hyperparameter set. The fitness of a particle was evaluated by: 1) training an SVM with those parameters, 2) calculating the training FPR, FNR, and F2 score, 3) standardizing these metrics, and 4) plugging them into the surrogate model \( f(t) \) to obtain a predicted test F2 score. The PSO swarm iteratively converged towards the hyperparameter set that maximized this predicted score.
The optimal hyperparameters identified by the PSO algorithm yielded the following performance on a held-out test set:
| Dataset | FPR | FNR | F2 Score | Predicted F2 (by surrogate) |
|---|---|---|---|---|
| Training Set | 0.110 | 0.015 | 0.920 | – |
| Test Set (Actual) | 0.229 | 0.186 | 0.727 | 0.752 |
The PSO-optimized SVM model achieved a substantial improvement over the average SVM performance, raising the test F2 score from 0.683 to 0.727. Crucially, it reduced the test FNR from 22.4% to 18.6%, meaning significantly fewer slabs with slag inclusions would be missed. The slight discrepancy between the predicted (0.752) and actual (0.727) test F2 score is acceptable, confirming the surrogate model’s utility in guiding the optimization. The model outputs a probability value between 0 and 1. For practical deployment, I recommend a stratified action strategy based on this probability \( p \):
- \( p < 0.4 \): Slab classified as defect-free. No inspection required.
- \( 0.4 \leq p < 0.7 \): Slab classified as suspect. Implement statistical sampling inspection.
- \( p \geq 0.7 \): Slab classified as high-risk for slag inclusions. Mandatory 100% inspection and grinding if confirmed.
This strategy balances inspection resource allocation with the imperative to catch defective product, directly contributing to enhanced rolled product quality stability.
4. Conclusion
In this study, I have developed and optimized a machine learning model for the online prediction of surface slag inclusions in continuous casting slabs. The core contributions are twofold: first, the demonstration that a class-weighted SVM algorithm provides a superior balance between model fitting and generalization compared to Random Forest and AdaBoost for this specific imbalanced classification task; and second, the novel methodology of using a surrogate model to explicitly link training performance metrics to expected test performance, enabling the effective use of Particle Swarm Optimization for hyperparameter tuning without requiring test set information during the optimization loop.
The finalized PSO-SVM model achieved a test false negative rate of 18.6% and an F2 score of 0.727, representing a significant enhancement in reliability for identifying slabs with slag inclusions. By establishing the relationship \( f(t) = 0.62 – 1.19 t_2 t_3 – 0.89 t_3^2 – 0.40 t_2^2 – 0.35 t_1 t_3 – 0.28 t_1 t_2 \), this work provides a principled framework for model parameter optimization that directly targets improved practical application accuracy. The implementation of the resulting model within a slab quality analysis system, coupled with the proposed stratified inspection protocol, enables the proactive identification and removal of defective slabs. This directly reduces the incidence of hereditary rolling defects caused by slag inclusions, thereby increasing metal yield and strengthening the stability of final product quality in steel manufacturing.
