Traceability and Root Cause Analysis for Quality of Casting Parts

In the aerospace industry, the production of high-quality casting parts is critical, especially for components like titanium alloy castings used in engines. These casting parts often suffer from defects such as shrinkage porosity, hot cracks, and inclusions due to the long and complex casting processes, leading to high scrap rates and potential safety hazards. As a researcher focused on intelligent manufacturing, I aim to develop methods for tracing and identifying the root causes of defects in casting parts, leveraging data analytics and machine learning. This study explores the use of process data collected from an ERP system to build predictive models that link multi-process parameters to defect occurrences in casting parts, ultimately enhancing quality control and reducing waste in aerospace casting production.

The foundation of this research lies in the comprehensive data extraction from the Huazhu ERP system, which implements a single-piece lifecycle management model for casting parts. Each casting part is assigned a unique identifier, allowing for the tracking of process parameters across various stages, including rapid prototyping, wax dipping, shell building, dewaxing, pre-sintering, and vacuum melting and pouring. This data-driven approach enables the collection of detailed parameters that influence the quality of casting parts. After data cleaning and handling missing values, 176 complete samples were obtained, each with 36 input parameters derived from multiple processes. To address the high dimensionality, particularly from the shell-building工序 which involves 10 layers with 4 parameters each, principal component analysis (PCA) was applied to reduce the 40 parameters to 17 principal components, capturing the essential variance without losing critical information. The PCA transformation can be expressed as:

$$ Z = XW $$

where $ X $ is the original data matrix of process parameters for casting parts, $ W $ is the matrix of eigenvectors from the covariance matrix, and $ Z $ represents the principal component scores. This reduction streamlined the dataset for subsequent modeling, focusing on three key defect types in casting parts: shrinkage, cracks, and inclusions, encoded using one-hot encoding for classification tasks.

With the processed data, I developed three machine learning models to predict defects in casting parts: a multilayer perceptron (MLP) neural network, a random forest algorithm, and an XGBoost model. The goal was to compare their performance in tracing quality issues back to process parameters. For the MLP model, I experimented with various architectures, activation functions, and optimizers. The optimal structure for predicting shrinkage defects in casting parts was a 36×22×10×3 network using ReLU activation and the Adam optimizer, with a learning rate of 0.01 and 300 iterations. The loss function minimized during training is:

$$ L = -\frac{1}{N} \sum_{i=1}^{N} \sum_{c=1}^{C} y_{i,c} \log(\hat{y}_{i,c}) $$

where $ N $ is the number of samples, $ C $ is the number of defect classes for casting parts, $ y_{i,c} $ is the true label, and $ \hat{y}_{i,c} $ is the predicted probability. For crack and inclusion defects in casting parts, similar tuning led to structures of 36×32×16×3 and 36×32×10×3, respectively. The MLP model showed high accuracy for individual defects but struggled with multi-defect predictions due to data imbalances.

The random forest model was configured with parameters such as the number of estimators (trees) and maximum depth. Through grid search, I found that for shrinkage prediction in casting parts, 46 trees with a depth of 14 yielded the best accuracy. The random forest algorithm aggregates predictions from multiple decision trees, where the output for a casting part defect class $ c $ is given by:

$$ \hat{y}_c = \frac{1}{T} \sum_{t=1}^{T} f_t(x) $$

with $ T $ as the number of trees and $ f_t(x) $ as the prediction from tree $ t $. This ensemble approach reduced overfitting and improved generalization for casting parts. Similarly, for cracks and inclusions, optimal parameters were identified, enhancing the model’s ability to handle imbalanced data in casting parts.

The XGBoost model, a gradient boosting method, was tuned with parameters like subsample ratio, feature ratio, and learning rate. The objective function for XGBoost when predicting defects in casting parts includes a regularization term:

$$ \text{Obj} = \sum_{i=1}^{N} l(y_i, \hat{y}_i) + \sum_{k=1}^{K} \Omega(f_k) $$

where $ l $ is the loss function, $ \Omega $ penalizes model complexity, and $ K $ is the number of trees. After adjustment, a subsample ratio of 0.4 and feature ratio of 0.5 with a learning rate of 0.1 provided the best performance for casting parts. The model was optimized for different defect types, but like the others, it faced challenges with data scarcity for certain classes in casting parts.

To evaluate these models for casting parts, I used metrics such as accuracy and recall, computed as:

$$ \text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN} $$

$$ \text{Recall} = \frac{TP}{TP + FN} $$

where $ TP $, $ TN $, $ FP $, and $ FN $ represent true positives, true negatives, false positives, and false negatives for defect predictions in casting parts. The performance was summarized in tables to facilitate comparison. Below is a table showing the accuracy and recall for each model when predicting individual defects in casting parts:

Defect Type	Model	Accuracy	Recall
Shrinkage	MLP Neural Network	0.72	0.73
	Random Forest	0.80	0.79
	XGBoost	0.75	0.78
Cracks	MLP Neural Network	0.97	0.94
	Random Forest	0.99	0.95
	XGBoost	0.97	0.97
Inclusions	MLP Neural Network	0.88	0.88
	Random Forest	0.87	0.82
	XGBoost	0.88	0.87

For multi-defect predictions in casting parts, the models were assessed on combined tasks. The table below compares their accuracy for simultaneous defect detection in casting parts:

Prediction Task	MLP Neural Network Accuracy	Random Forest Accuracy	XGBoost Accuracy
Cracks + Inclusions	0.85	0.85	0.86
Shrinkage + Cracks + Inclusions	0.60	0.70	0.65

A comprehensive scoring system was applied to weigh the models’ performance across different defects in casting parts, accounting for data imbalances. The score $ S $ is calculated as:

$$ S = w_1 \cdot A_1 + w_2 \cdot A_2 + w_3 \cdot A_3 + w_m \cdot A_m $$

where $ w_1, w_2, w_3 $ are weights for shrinkage, cracks, and inclusions in casting parts (set to 0.5, 0.2, and 0.3 based on data balance), $ A_1, A_2, A_3 $ are accuracies for individual defects, $ w_m $ is the weight for multi-defect accuracy, and $ A_m $ is the multi-defect accuracy. The results showed that random forest achieved the highest overall score of 0.7795, followed by XGBoost at 0.7475 and MLP at 0.7090, indicating its superior capability for quality traceability in casting parts.

The application of these models for quality traceability in casting parts involves inputting process parameters to predict defect occurrences. By analyzing the parameter combinations that lead to defects, we can identify root causes. For instance, in casting parts, shrinkage defects may be linked to parameters like pouring temperature or pre-sintering time, while cracks could relate to laser power in rapid prototyping or drying time in shell building. The random forest model, with its high accuracy, allows for reliable inverse analysis: given a defect in a casting part, we can trace back to the most influential process parameters using feature importance scores. The importance $ I_j $ for parameter $ j $ in casting parts is computed as:

$$ I_j = \frac{1}{T} \sum_{t=1}^{T} \Delta \text{Impurity}_t(j) $$

where $ \Delta \text{Impurity}_t(j) $ is the decrease in impurity (e.g., Gini index) due to splits on parameter $ j $ in tree $ t $. This helps pinpoint critical control points in the production of casting parts, such as optimizing wax dipping temperature or adjusting vacuum levels during melting to reduce defect rates.

In conclusion, this study demonstrates the effectiveness of machine learning models for traceability and root cause analysis in the quality management of casting parts. By integrating data from an ERP system and applying dimensionality reduction via PCA, we built predictive models that correlate multi-process parameters with defects in casting parts. Among the models, random forest exhibited the best performance in terms of accuracy and recall, particularly for balanced datasets like shrinkage defects in casting parts, while all models faced challenges with imbalanced data for cracks and inclusions. The methodology enables real-time monitoring and decision-making for casting parts production, reducing scrap rates and enhancing aerospace casting quality. Future work could involve collecting more data to address imbalances and exploring deep learning architectures for even more complex relationships in casting parts. Ultimately, this approach paves the way for intelligent manufacturing systems that ensure the reliability and safety of critical casting parts in aerospace and beyond.