Data Mining for Quality Intelligence in Sand Casting Production: A Model and Its Application

The dawn of the 21st century has been marked by the relentless advancement of computer technology, propelling digitalization, networking, and intelligentization to the forefront as the primary engines driving manufacturing evolution. This technological wave offers novel pathways for quality control. Many enterprises have adopted integrated information management systems like Enterprise Resource Planning (ERP) and Manufacturing Execution Systems (MES) to govern their operations.

As a pivotal branch of foundry, sand casting holds a position of critical importance in sectors like the automotive industry, particularly in the production of key components such as engine cylinder heads. The manufacturing of sand casting products is characterized by complex, multi-stage, and high-volume production processes. These inherent characteristics frequently lead to significant quality fluctuations and formidable challenges in tracing the root causes of defects. In response to pressures from industrial upgrading, environmental regulations, and evolving customer demands, sand casting enterprises are actively transitioning towards automated and information-driven production. The shop floor is now equipped with numerous data acquisition devices interfaced with ERP, MES, and other management systems. These systems archive vast quantities of authentic production data encompassing equipment status, process parameters, environmental conditions, and quality metrics—a veritable treasure trove containing the intricate relationships between production inputs and final product quality. However, a pervasive issue persists: this accumulated “wealth” of data largely remains untransformed into actionable information and knowledge. Only a minuscule fraction is actively utilized, leaving enterprises in a paradoxical state of “data explosion but knowledge scarcity.” A deep, analytical mining of this collected data to uncover the coupled relationships among various production stages and parameters would undoubtedly empower foundries to achieve superior quality control and effective defect traceability. Therefore, implementing data mining within sand casting enterprises is of profound significance.

Data mining is the process of analyzing extensive existing datasets to reveal meaningful new relationships, trends, and patterns. It is the practice of extracting potentially valuable, previously unknown, and comprehensible information from random, massive, noisy, incomplete, and ambiguous large-scale databases. It serves as a crucial decision-support process. While the field often refers to this as Knowledge Discovery in Databases (KDD), data mining is fundamentally a core step within that broader discovery process. As an emerging technology, data mining is navigating its path to maturity, requiring dedicated research and development to reach its full potential and widespread acceptance. Its application has already proven valuable across diverse domains such as financial management, insurance, and management systems research. Nevertheless, its adoption within the foundry industry, particularly for sand casting products, remains in a nascent stage.

A critical prerequisite for effective data mining is the availability of substantial data volumes. The stability and veracity of mining outcomes improve as data quantity increases, yielding insights closer to real-world conditions. ERP and MES systems are perfectly positioned to provide this essential data support. Consequently, integrating data mining capabilities on top of an ERP foundation represents a vital strategic direction for enhancing quality control and elevating the quality of sand casting products.

In this context, a data mining methodology built upon a dedicated foundry ERP system is proposed. This methodology is designed to perform deep analytical mining on all collected production data, with the goal of empowering sand casting enterprises to improve product quality and operational efficiency.

1. The Foundry ERP System Framework

The core system underpinning this approach is a specialized foundry ERP system developed with the explicit goals of standardizing management, boosting efficiency, reducing costs, accelerating digital transformation, and enhancing market responsiveness for casting enterprises. It facilitates comprehensive information integration across the organization. The system’s workflow is customer-centric and task-driven, utilizing orders to pull production. It enables holistic management of procurement, production, sales, and inventory, making enterprise operations more streamlined and effective. The foundational business logic framework is illustrated conceptually below.

Consider a real-world application at Company Y, a subsidiary specializing in high-volume production of diesel engine components like cylinder blocks, cylinder heads, crankshafts, and flywheels. Due to stringent quality control requirements, the company implemented this ERP system, managing each cast piece individually (single-piece management). The adapted business process is as follows: Order Entry → Setup of Processing Routes, Bill of Materials (BOM) Creation, Casting Process Assignment → Production Preparation (including mold management, raw material management, and procurement) → Planned Production Launch → Production Processing → Sales and Shipping.

2. The Data Mining Methodology

The data mining workflow is structured into three interconnected parts: data acquisition, the data mining model, and the presentation of results.

2.1 Data Acquisition

Data from every production环节 is fed into the system through a combination of automated equipment data capture and manual human entry, establishing the robust data foundation necessary for mining. This includes modules for automatic parameter logging from machines, manual order entry, digital process cards, quality defect registration, and shipping documentation.

2.2 The Data Mining Model

The arsenal of data mining techniques includes neural networks, decision trees, association analysis, rough sets, fuzzy sets, statistical analysis, and visualization. The ERP system’s core relies on database technology, specifically SQL Server—a relational database management system. SQL Server organizes data into interrelated two-dimensional tables via relational models, making it exceptionally well-suited for association analysis methods. Therefore, the proposed mining model primarily leverages association analysis supplemented by neural network methods.

The model’s architecture is pivotal. It uses the unique Casting Identification Number (Single Piece ID) as the primary input key. This key forms the first-level association with critical records: the Sales Order, Quality Inspection Records, and Production Launch Records. The Sales Order can be further traced upward to Customer information and Shipping Records, enabling deep analysis of customer demand patterns. The Production Launch Record links to the specific Process Card detailing the manufacturing instructions. The Quality Record serves a dual purpose: firstly, it can be associated via timestamps with all process parameter logs from various production stages, which, when combined with the Process Card, allows for comprehensive process mining and analysis. Secondly, the quality data itself can be analyzed to determine defect root causes, identify bottleneck processes, and perform other quality-centric analyses. The fundamental relationship can be conceptually represented by a function mapping inputs to a network of associated data objects:

$$ \text{Casting ID} \rightarrow f(\text{Order}, \text{Quality Records}, \text{Production Records}) $$

$$ \text{Order} \rightarrow g(\text{Customer}, \text{Shipping}) $$
$$ \text{Production Records} \rightarrow h(\text{Process Card}) $$
$$ \text{Quality Records} \rightarrow i(\text{Process Parameters}, \text{Defect Analysis}) $$

Based on this model, the key database tables include the Order Details table, Planned Production table, Production Quality Inspection table, Production Parameter Log table, and the Shipping Bill. Their relational structure is summarized in the table below.

Table Name	Primary Key	Key Foreign Keys & Related Fields	Description
Order Details	Order_ID, Item_Line	Part_Number, Customer_ID	Contains customer order information for specific sand casting products.
Planned Production	Production_Lot_ID, Casting_ID	Order_ID, Process_Card_ID	Links an individual casting to an order and its assigned process instructions.
Production Quality Inspection	Inspection_ID	Casting_ID, Defect_Code, Timestamp	Records the final quality status and defects for each sand casting product.
Production Parameter Log	Log_ID	Machine_ID, Parameter_Name, Timestamp, Value	Stores time-series data of process parameters (e.g., temperature, pressure) from equipment.
Shipping Bill	Shipping_ID	Order_ID, Casting_ID (often via lot)	Tracks the dispatch of finished sand casting products to customers.

The linkage, often established via primary and foreign keys (e.g., Casting_ID), enables traversing the entire production history of a specific sand casting product.

2.3 Data Preprocessing for Mining

Raw industrial data is seldom mining-ready. A critical step involves preprocessing the data extracted via the association model. Key tasks include handling missing values in parameter logs, normalizing parameter scales (e.g., temperature in °C, pressure in bar), encoding categorical data (e.g., defect types, mold identifiers), and aligning time-series process data with the final quality record of the corresponding casting. A structured preprocessing pipeline is essential. For a set of $n$ castings, each with $m$ associated process parameters $p$ and a quality label $q$ (e.g., pass/fail or defect code), we create a cleaned feature matrix $X$ and target vector $Y$:

$$
X = \begin{bmatrix}
x_{1,1} & x_{1,2} & \cdots & x_{1,m} \\
x_{2,1} & x_{2,2} & \cdots & x_{2,m} \\
\vdots & \vdots & \ddots & \vdots \\
x_{n,1} & x_{n,2} & \cdots & x_{n,m}
\end{bmatrix}, \quad
Y = \begin{bmatrix}
y_1 \\
y_2 \\
\vdots \\
y_n
\end{bmatrix}
$$

where $x_{i,j}$ is the preprocessed value of the $j$-th parameter for the $i$-th casting, and $y_i$ is its quality outcome. Common preprocessing steps are summarized below.

Step	Action	Example/Technique
Cleaning	Handle missing/null values.	Mean/median imputation, or record exclusion.
Normalization	Scale numerical features to a common range.	Min-Max scaling: $x’ = \frac{x – \min(x)}{\max(x) – \min(x)}$
Encoding	Convert categorical text to numbers.	One-Hot Encoding for defect types.
Alignment	Sync process data timestamps with casting ID.	Time-window aggregation (e.g., average parameter during pouring).

2.4 Presentation of Mining Results

The analytical results derived from mining are presented through a combination of detailed data tables and intuitive statistical charts. These can include tables summarizing defect rates by product line or machine, scatter plots depicting daily production output trends, bar charts visualizing the distribution of different defect types, and pie charts showing monthly production share of various sand casting products. This multi-format presentation caters to different analytical needs, from detailed audits to high-level overviews.

2.5 Predictive Modeling using Neural Networks

The established data associations create a powerful mapping: a set of process parameters (inputs) is linked to a final quality result (output). This mapping can be used to train a predictive model. Artificial Neural Networks (ANNs) are particularly adept at learning complex, non-linear relationships in high-dimensional data, making them suitable for quality prediction in sand casting.

A basic feedforward neural network can be constructed. The preprocessed feature vector $\vec{x_i}$ for a casting serves as the input layer. One or more hidden layers with activation functions (e.g., ReLU) learn intermediate representations. The output layer provides the prediction (e.g., a probability of being defective, or a regression value for a dimensional deviation).

The computation for a neuron in a hidden layer is given by:
$$ z = \sum_{j=1}^{k} (w_j \cdot a_j) + b $$
$$ a = \sigma(z) $$
where $w_j$ are weights, $a_j$ are inputs from the previous layer, $b$ is the bias, and $\sigma$ is the activation function.

The network is trained by minimizing a loss function $L$ (e.g., Mean Squared Error for regression, Cross-Entropy for classification) over the dataset:
$$ L(W, b) = \frac{1}{n} \sum_{i=1}^{n} \ell(\hat{y}_i, y_i) $$
where $\hat{y}_i$ is the network’s prediction for the $i$-th sample, $y_i$ is the actual quality label, and $W, b$ represent all network weights and biases. Optimization algorithms like Adam or SGD are used to adjust $W$ and $b$ to minimize $L$.

Once trained, the model can predict the quality outcome $\hat{y}$ for a new, unseen set of process parameters $\vec{x}_{new}$ before the sand casting product is even made:
$$ \hat{y} = \text{ANN}_{\text{trained}}(\vec{x}_{new}) $$
This enables proactive quality assessment, allowing process engineers to simulate and evaluate the impact of different parameter settings on the predicted quality of sand casting products, facilitating process optimization and defect prevention.

3. Application and Impact

The practical application of this data mining model within the ERP system addresses the core challenge of “data explosion but knowledge scarcity.” By implementing the association-based mining framework, the enterprise can systematically transform its accumulated production data into actionable intelligence.

From a customer perspective, mining order and shipping data reveals demand patterns, delivery performance, and potential issues with specific clients or product types. In quality management, analyzing the defect distribution linked to process cards and parameter logs allows for precise root cause analysis. For instance, a spike in porosity defects can be traced back to specific pouring temperature ranges or sand compactness values recorded on particular dates for that batch of sand casting products. Production capacity analysis, via mining of production launch and completion timestamps, identifies bottlenecks in the manufacturing line.

The predictive capability of the neural network model represents a significant leap from reactive to proactive quality control. By forecasting potential quality issues based on planned process parameters, corrective actions can be taken in advance—such as adjusting a furnace temperature or sand mixture—to prevent defects in the next batch of sand casting products. This not only reduces scrap and rework costs but also enhances overall equipment effectiveness (OEE) and customer satisfaction through more consistent quality.

Ultimately, this integrated approach allows the foundry to fully leverage its digital infrastructure. The ERP system transitions from being primarily a record-keeping and logistical tool to becoming the central nervous system for quality intelligence. The data mining model acts as its analytical brain, continuously learning from production outcomes to guide better decisions, optimize processes, and ensure the reliable, high-quality manufacture of complex sand casting products. This synergy between enterprise information management and advanced data analytics is key to building a more competitive, efficient, and intelligent foundry operation in the modern manufacturing landscape.