Multi-Source Heterogeneous Data Fusion for Intelligent Production and Quality Traceability of Complex Casting Parts

The manufacturing of high-integrity casting parts, particularly within the automotive industry using green sand foundry processes, presents a significant challenge for modern digitalization. The production lifecycle is inherently complex, involving multiple stages such as melting, sand preparation, core making, molding, and cleaning. Each stage generates vast amounts of data from diverse sources including programmable logic controllers (PLCs), sensors, supervisory systems, laboratory equipment, and manual human inputs. This data is characterized by its multi-source, heterogeneous, and dispersed nature, leading to severe information silos. Consequently, tracing the root cause of quality defects in individual casting parts or batches becomes an arduous, often manual, and error-prone task. My experience in implementing a Manufacturing Execution System (MES) has centered on overcoming these barriers by developing and applying a robust framework for multi-source heterogeneous data fusion, ultimately enabling full product life-cycle traceability and data-driven quality enhancement.

The core difficulty lies in the data’s inherent characteristics. Firstly, data is multi-source, originating from equipment with different brands and models of PLCs, various analog and digital sensors, human-machine interfaces (HMIs), enterprise resource planning (ERP) systems, and spectrometers. Secondly, it is heterogeneous in format, encompassing structured data (e.g., database tables), semi-structured data (e.g., CSV files from test equipment), and unstructured data (e.g., image files of defects or textual logs). Thirdly, data is physically and logically dispersed across different network segments, controller memories, and independent databases. Finally, data quality varies significantly; some sources provide clean, immediately usable data, while others require substantial transformation and enrichment. Without a unified method to acquire, cleanse, integrate, and contextualize this data, it remains an untapped asset rather than a driver for improvement.

Architectural Foundation for Data Integration

The foundational step towards intelligent manufacturing of casting parts is establishing a coherent data integration architecture. The goal is to create a logical or physical centralization point—often a data lake or a unified database—where standardized data is made available for upper-layer applications like production scheduling, process control, and quality management. The general architecture hinges on a powerful data acquisition and integration platform that acts as a universal translator and conduit.

This platform’s core strength is its support for a wide array of communication protocols (e.g., OPC UA, Modbus TCP, Siemens S7) and device drivers. It must handle both batch data (e.g., daily production summaries from ERP) and real-time streaming data (e.g., molten metal temperature from a sensor). It performs the critical initial task of reading raw data from its source, whether by polling a PLC memory address, reading a file from an FTP server, or consuming a message from an API. Once ingested, the platform applies pre-configured rules for data conversion and standardization before persisting the unified data set. This architecture effectively breaks down the silos, enabling a single source of truth for the manufacturing process of every casting part.

Table 1: Characterization of Multi-Source Data in Casting Production
Data Category	Source Examples	Data Format / Protocol	Integration Challenge
Process Parameters	Melting Furnace PLC, Molding Line PLC	Real-time tags (OPC, S7), Structured Logs	Diverse PLC brands, real-time streaming
Quality Test Results	Spectrometer, CT Scanner, Dimensional Gauge	CSV Files, Database Records, Images	File-based output, non-standardized formats
Material & Batch Info	ERP System, Barcode Scanners	Database Tables (SQL), API Calls	Batch synchronization, data consistency
Manual Operations	MES Client Terminals, Mobile Devices	Structured Forms (HTTP/HTTPS)	Data entry errors, timeliness
Environmental Data	Temperature/Humidity Sensors, Power Meters	Modbus TCP, MQTT	High-frequency time-series data

The Data Fusion Pipeline: From Raw Signals to Contextualized Information

Transforming chaotic raw data into actionable information for casting parts quality management requires a systematic pipeline. This pipeline consists of four key stages: Data Acquisition, Data Cleansing, Data Transformation, and Data Fusion.

1. Data Acquisition and Edge Processing

The first physical step is establishing reliable communication with every data source. For shop-floor equipment, this often necessitates network infrastructure upgrades, including industrial ring networks and edge computing devices. A critical consideration is network security and segmentation to prevent interference between production control networks and data acquisition networks. Each device is assigned a unique IP, and the appropriate driver is loaded into the acquisition platform.

A common example is reading a “melt start” signal from a furnace PLC. The raw data is a single bit in the PLC’s memory. The integration platform, using a configured script, reads this bit continuously. When a state change from 0 to 1 is detected, it triggers an event. The script then contextualizes this event by fetching other related data (like a heat number) and writes a structured record to a database. This process converts a low-level PLC signal into a semantically meaningful event log tied to a specific production activity for a batch of casting parts.

Below is a simplified conceptual representation of the logic. Let a raw PLC signal be a binary variable $S_{furnace}(t) \in \{0, 1\}$. The acquisition system detects an event $E_{melt\_start}$ at time $t_0$ when:
$$E_{melt\_start} = \{ t_0 | S_{furnace}(t_0^-)=0 \land S_{furnace}(t_0^+)=1 \}$$
Upon this event, the system executes a fusion function $F_{acquisition}$ which gathers context $C$ (e.g., heat ID, operator ID) and creates a unified record $R$:
$$R_{melt\_start} = F_{acquisition}(E_{melt\_start}, C) = (t_0, \text{`MELT_START’}, \text{Heat\_ID}, \text{Operator\_ID}, …)$$
This record $R$ is then persisted to a historical database.

2. Data Cleansing and Quality Assurance

Data from diverse sources inevitably contains “dirty” elements: duplicates, null values, outliers, or logically contradictory entries. Cleansing is the process of detecting and rectifying these issues. For instance, duplicate records might arise from network re-transmissions or redundant data entry. A common cleansing task for casting parts tracking is removing duplicate entries for the same part serial number (Body Number). This is achieved through SQL queries that identify and retain only the most recent or complete record.

Let $D$ be the raw dataset containing records for casting parts. Each record has a unique key $K$ (e.g., row ID) and a part identifier $P_{id}$. The set of duplicate records for a given $P_{id}$ is:
$$D_{dup}(P_{id}) = \{ r_i, r_j \in D | r_i[P_{id}] = r_j[P_{id}] \land r_i[K] \neq r_j[K] \}$$
The cleansing rule $C_{rule}$ might be to keep the record with the maximum timestamp $T$ or the maximum $K$:
$$D_{clean} = \{ r \in D | r[K] = \max_{s \in D, s[P_{id}]=r[P_{id}]}(s[K]) \}$$
This ensures that for each unique casting part in the system, only one canonical production record exists, which is fundamental for accurate traceability.

Table 2: Common Data Cleansing Rules for Casting Production Data
Data Issue	Example in Casting	Cleansing Rule / Technique
Duplication	Same part serial number scanned twice at different stations.	Identify by unique key (BodyNo), retain record with latest timestamp or highest RowID.
Missing Values	Spectrometer CSV file missing a crucial element percentage.	Flag record as “invalid test”; trigger alarm for manual re-test. Do not propagate nulls.
Format Inconsistency	Dates from PLC in “DDMMYYYY”, from ERP in “YYYY-MM-DD”.	Apply a unified format transformation during data ingestion (e.g., to ISO 8601).
Out-of-Range Values	Mold pressure sensor reading 500 MPa (physically impossible).	Define valid min/max ranges per parameter. Discard or flag values outside this range as sensor faults.

3. Data Transformation and Standardization

Once cleansed, data must be transformed into a standardized format that applications can consistently understand. Different systems often use different codes for the same entity (e.g., a production line might be “LINE_A” in the PLC but “LNA” in the ERP). Transformation involves mapping these source-specific values to a master set of standard codes governed by the MES or a plant-wide data governance model.

For example, a spectrometer generates a CSV file for each test. The raw file contains columns like “Sample_ID,” “C,” “Si,” “Mn,” etc. However, it lacks context: which furnace did it come from? Which heat or lot of casting parts does it correspond to? The transformation step enriches this data. It uses the “Sample_ID” (which follows a plant-specific naming convention) to look up and append metadata: Furnace_Code, Heat_Number, Product_Code, and Timestamp. The transformed record is now a rich, contextualized data point ready for fusion and analysis.

Table 3: Transformation Mapping for Spectrometer Data
Raw CSV Field	Example Raw Value	Transformation Logic / Mapping	Standardized Output Field
Sample_ID	FUR-A-1023	Parse string: Prefix ‘FUR-‘ indicates Furnace A. Suffix ‘1023’ is Heat Number.	Furnace_Code=’A’, Heat_Number=’1023′
C	3.45	Direct mapping, but unit confirmed as weight %.	Carbon_Percent=3.45
Test_Time	14:35:02	Combine with date from file system or acquisition event.	Analysis_Timestamp=’2023-08-15 14:35:02′
(Missing)	N/A	Query MES schedule using Heat_Number to find Product_Code.	Product_Code=’ENG_BLOCK_V8′

4. Data Fusion and Contextual Binding

This is the pinnacle of the pipeline, where data from disparate sources is logically integrated to form a complete digital thread for each casting part. Fusion is not merely storing data in the same database; it is about creating explicit relationships. The primary key for this relationship is the unique identifier of the casting part (e.g., a serial number or a batch/lot ID).

During production, as the part moves through stations, its identifier is scanned or automatically associated with process data from that station. The MES system performs this binding in real-time. For instance, when a core is placed in a mold, the system binds the core’s batch ID (with its own property data) to the mold’s ID. Later, when the molten metal is poured, the pour temperature, time, and metal chemistry data (from the transformed spectrometer report) are bound to the mold ID. After shakeout, the individual casting part is stamped with a unique serial number. All preceding data bound to the mold and its processes are now inherited by this serial number.

This creates a fused data object $F_{part}$ for a casting part with serial number $S_N$:
$$F_{part}(S_N) = \{ D_{melting}(H), D_{molding}(M), D_{cores}(C), D_{pouring}(P), D_{testing}(T), … \}$$
where:

$D_{melting}(H)$: All data from heat $H$ used for this part.
$D_{molding}(M)$: All data from mold $M$ used for this part.
$D_{cores}(C)$: Data from core batch $C$ used in mold $M$.
$D_{pouring}(P)$: Pouring parameters recorded for mold $M$.
$D_{testing}(T)$: Dimensional, spectroscopic, or X-ray test results for part $S_N$.

This comprehensive fusion enables true root-cause analysis. A porosity defect found in final inspection can be traced back to specific sand moisture levels, pour temperatures, or chemical compositions associated with that exact casting part.

Enabling Comprehensive Quality Traceability

The primary application of this fused data is end-to-end quality traceability. The system supports both forward traceability (tracking where a specific batch of raw material or a specific process parameter set ended up) and backward traceability (identifying the origin of a defect found in a finished casting part). The business process is fully digitized.

When a defect is detected during inspection—be it visual, dimensional, or from non-destructive testing—the inspector records it in the MES against the part’s serial number. The defect is coded (e.g., “POR-1” for shrinkage porosity), located (e.g., “CYLINDER_BORE_3”), and can be supplemented with a photograph. This creates a precise quality record.

For analysis, a quality engineer can query the system using the defective part’s serial number. The system retrieves the complete fused data object $F_{part}(S_N)$, presenting a holistic timeline view of that part’s manufacturing journey. The engineer can then cross-reference each process parameter against its defined control limits or optimal “golden batch” values. This allows for rapid hypothesis testing: “Was the pour temperature for this part lower than the standard?” or “Did this part come from sand with abnormally high moisture?”

Furthermore, the aggregated fused data enables Statistical Process Control (SPC). Control charts ($\bar{X}-R$ charts, for example) can be automatically generated for any key parameter (e.g., melt temperature, carbon equivalent) over time or by batch. The system can provide real-time alerts when processes show signs of trending out of control, enabling proactive intervention before defective casting parts are produced. The SPC calculation for a parameter $x$ with subgroup size $n$ is fundamental:
$$ \bar{X} = \frac{\sum_{i=1}^{n} x_i}{n}, \quad R = \max(x_i) – \min(x_i) $$
Control limits are then calculated as:
$$ UCL_{\bar{X}} = \bar{\bar{X}} + A_2 \bar{R}, \quad LCL_{\bar{X}} = \bar{\bar{X}} – A_2 \bar{R} $$
$$ UCL_{R} = D_4 \bar{R}, \quad LCL_{R} = D_3 \bar{R} $$
where $\bar{\bar{X}}$ and $\bar{R}$ are the averages of the subgroup averages and ranges, and $A_2, D_3, D_4$ are constants based on $n$. The fused data stream provides the $x_i$ values directly from the process, automating this entire monitoring workflow.

Data Empowerment: From Traceability to Prediction and Optimization

The initial victory is achieving 100% traceability for complex casting parts like engine blocks and cylinder heads. This directly improves productivity by reducing time spent on manual traceability investigations and improves quality by enabling precise corrective actions. However, the greater value is unlocked over time as the repository of high-quality, fused historical data grows. This data becomes the fuel for advanced analytics and machine learning models, moving from reactive quality control to predictive quality assurance.

With a sufficient volume of fused production records, it becomes possible to build predictive quality models. These models identify complex, non-linear relationships between dozens of input parameters (from melting, sand, molding, etc.) and the final quality outcomes of the casting parts. A generic form of such a model could be a classification model predicting defect probability:
$$ P(\text{Defect} = \text{POR}) = f_{model}(x_1, x_2, …, x_m) $$
where $x_1, …, x_m$ are the fused process parameters (e.g., $x_1$=Pour Temp, $x_2$=Sand Compactability, $x_3$=Carbon Content, …) and $f_{model}$ could be a logistic regression, random forest, or neural network algorithm trained on historical data.

Once deployed, such a model can run in near-real-time. As a new mold is prepared or a new heat is tapped, the current process parameters are fed into the model. If the predicted probability of a specific defect exceeds a threshold, the system can alert process engineers or even trigger automatic feedback control loops—for instance, adjusting a cooling parameter or flagging the batch for enhanced inspection. This closes the loop from data collection to intelligent action, truly empowering the manufacturing process with data.

The journey towards intelligent foundry operations is fundamentally a data journey. It begins with the pragmatic integration of multi-source heterogeneous data, overcoming significant technical and architectural challenges to build a reliable digital thread for each casting part. This foundation of traceability is not an end in itself but a necessary platform. It enables the continuous improvement of existing processes and, ultimately, the deployment of powerful predictive and prescriptive analytics. The path forward for casting enterprises lies in recognizing data as a core strategic asset and building the fusion capabilities to harness its full potential for quality enhancement, cost reduction, and efficiency gains.