The pursuit of manufacturing excellence hinges on impeccable quality control. In the domain of metalworking, the casting process is foundational, producing complex casting part geometries used across automotive, aerospace, and machinery sectors. However, this process is inherently susceptible to various surface defects—such as gas pores, shrinkage cavities, cracks, and cold shuts—which can severely compromise the structural integrity, performance, and longevity of the final casting part. Traditional manual inspection is labor-intensive, subjective, and inefficient for modern high-volume production lines. Consequently, automated visual inspection systems powered by machine learning have become indispensable.
The efficacy of deep learning models, particularly convolutional neural networks (CNNs) for defect detection and classification, is critically dependent on the availability of large-scale, high-quality, and well-annotated datasets. In industrial settings, acquiring a sufficient number of defective samples is a significant bottleneck. Defects are, by nature, anomalies; thus, collecting thousands of images of flawed casting part surfaces is often impractical and economically burdensome. This data scarcity leads to imbalanced datasets, where defective samples are vastly outnumbered by non-defective ones, causing models to be biased, generalize poorly, and exhibit high false-negative rates—a critical failure in quality assurance.
To mitigate this, data augmentation techniques are universally employed. Conventional methods include geometric transformations (rotation, flipping, scaling), color space adjustments, and random noise injection. While these techniques expand the dataset size, they merely create variations of existing samples without generating fundamentally new defect morphologies or contexts. They lack the capacity to learn and reproduce the underlying complex data distribution of authentic casting part surface flaws, offering limited gains in model generalization.

Generative Adversarial Networks (GANs) present a paradigm-shifting solution to this challenge. Introduced by Goodfellow et al., GANs consist of two neural networks—a Generator (G) and a Discriminator (D)—engaged in a continuous adversarial game. The generator learns to map random noise from a latent space to synthetic data samples, while the discriminator learns to distinguish between real samples from the training dataset and fake samples produced by G. This competition drives both networks to improve iteratively, with the ultimate goal of the generator producing samples so realistic that the discriminator cannot tell them apart from genuine data. For casting part inspection, this means the potential to synthesize an unlimited number of novel, realistic defect images, thereby enriching the training dataset with diverse failure modes.
Among GAN variants, Deep Convolutional GANs (DCGANs) established critical architectural guidelines for stable training using CNNs, replacing fully connected layers with convolutional and transposed convolutional layers. Despite its progress, the standard DCGAN framework still faces persistent issues when applied to niche domains like industrial defect generation: training instability (mode collapse, non-convergence), generation of low-fidelity images with artifacts, and insufficient capture of fine-grained defect features crucial for accurate casting part analysis.
To overcome these limitations and effectively address the data scarcity problem in casting part surface inspection, I propose an enhanced generative model termed Attention-enhanced Modified DCGAN (AMDCGAN). This model integrates advanced attention mechanisms and an adaptive activation function into the DCGAN architecture, significantly boosting its feature representation capability, training stability, and the visual quality of generated defect samples.
Architectural Innovations in AMDCGAN
The core innovation of the AMDCGAN lies in its strategic modifications to both the generator and discriminator networks, moving beyond the basic DCGAN blueprint to achieve more focused and stable learning specifically for the casting part defect domain.
1. Reinforcing Feature Learning with Attention Mechanisms
Attention mechanisms enable neural networks to focus computational resources on the most informative parts of the input data, mimicking human perceptual systems. In the context of generating casting part defects, this means prioritizing the synthesis of defect regions and their intricate textures over homogeneous background areas.
ECA-Net in the Generator: I incorporate the Efficient Channel Attention (ECA) module into the generator’s deeper layers. Channel attention aims to model interdependencies between feature maps (channels). Standard channel attention modules like Squeeze-and-Excitation (SE) block employ dimensionality reduction, which can negatively impact performance. The ECA module avoids this by implementing a local cross-channel interaction strategy without reduction, using a fast one-dimensional convolution of size \(k\), where \(k\) is adaptively determined by channel dimensionality \(C\):
$$ k = \psi(C) = \frac{\mid \log_2(C) + b \mid}{\gamma} \text{odd}. $$
For an input feature map \(\mathbf{y}\), the channel attention weights \(\mathbf{w}\) are computed as:
$$ \mathbf{w} = \sigma(\text{C1D}_k(\mathbf{y})), $$
where \(\sigma\) is the sigmoid function and \(\text{C1D}_k\) denotes the 1D convolution. This lightweight module allows the generator to adaptively emphasize more meaningful feature channels responsible for defect characteristics, leading to more semantically rich and detailed output for the casting part surface.
EMA in the Discriminator: In the discriminator, I embed the Exponential Moving Average (EMA) attention module. EMA is a multi-scale attention mechanism designed to capture cross-spatial information efficiently. It partitions the input feature map into \(G\) groups along the channel dimension and processes them through parallel sub-networks. One branch employs 1×1 convolutions combined with efficient cross-channel interaction, while another parallel branch uses a 3×3 convolution to capture fine local spatial patterns. The outputs are then fused, and spatial attention weights are generated. This design enables the discriminator to scrutinize both global contextual relationships and local textural details of a casting part image simultaneously, making it a more powerful critic. This, in turn, provides higher-quality feedback to the generator, pushing it to synthesize defects that are locally consistent and globally coherent.
2. Enhancing Non-Linearity with Meta-AconC Activation
The choice of activation function profoundly influences gradient flow and learning dynamics. The ReLU function, common in DCGAN, has a fixed, non-differentiable point at zero and can cause “dying neurons.” For the complex task of learning the manifold of casting part defects, a more flexible activation is beneficial.
I replace the ReLU activations in the generator (except the final layer using Tanh) with the Meta-AconC function. Meta-AconC belongs to the ACON family of activations, which can smoothly switch between linear and non-linear states. Its key formula is:
$$ \text{Meta-AconC}(x) = (p_1 – p_2)x \cdot S(\beta (p_1 – p_2)x) + p_2 x, $$
where \(S(x) = (1 + e^{-x})^{-1}\) is the sigmoid function, \(p_1\) and \(p_2\) are learnable parameters initializing the upper and lower bounds of the activation, and \(\beta\) is a key meta-learned parameter controlling the switching behavior. The gradient is:
$$ \frac{d}{dx} \text{Meta-AconC}(x) = (p_1 – p_2)S(\beta (p_1 – p_2)x) + \beta (p_1 – p_2)^2 x S(\beta (p_1 – p_2)x)(1 – S(\beta (p_1 – p_2)x)) + p_2. $$
When \(\beta \to \infty\), the function acts non-linearly (like Swish); when \(\beta \to 0\), it acts linearly. By meta-learning \(\beta\) channel-wise during training, each neuron in the generator can adaptively decide its activation regime, leading to richer representations and significantly improved training stability for the casting part defect generation task.
3. Network Architecture Specifications
The detailed architecture of the proposed AMDCGAN is structured as follows:
Generator (G): The input is a 100-dimensional random noise vector \(\mathbf{z} \sim \mathcal{N}(0, I)\). It is projected and reshaped into a 512x4x4 feature map. Four transposed convolutional layers (fractionally-strided convolutions) then upscale the spatial resolution. The ECA module is inserted between the final two transposed convolutional layers. All intermediate layers use Meta-AconC activation, while the output layer uses Tanh to produce a 96x96x3 RGB image of a synthetic casting part defect. The output size \(H_{out}\) from a transposed convolutional layer is calculated as:
$$ H_{out} = \text{stride} \times (H_{in} – 1) + \text{kernel\_size} – 2 \times \text{padding}. $$
Discriminator (D): The input is a 96x96x3 image (real or fake). It passes through four convolutional layers for downsampling. The EMA attention module is embedded before the final convolutional layer. LeakyReLU activations are used in all intermediate layers. The final layer flattens the features and uses a sigmoid activation to output a scalar probability of the input being a real casting part defect image.
| Network | Layer Sequence | Key Modifications |
|---|---|---|
| Generator (G) | FC → Reshape → TConv1 (Meta-AconC) → TConv2 (Meta-AconC) → TConv3 (Meta-AconC) → ECA → TConv4 (Meta-AconC) → TConv5 (Tanh) | ECA module after TConv4; Meta-AconC activation. |
| Discriminator (D) | Conv1 (LeakyReLU) → Conv2 (LeakyReLU) → Conv3 (LeakyReLU) → EMA → Conv4 (LeakyReLU) → Flatten → Sigmoid | EMA module before final Conv4. |
Experimental Framework and Evaluation
To validate the effectiveness of the AMDCGAN for casting part surface defect generation, a comprehensive experimental study was conducted.
Dataset and Implementation
A custom dataset was constructed by cropping high-resolution images of actual defective casting surfaces. The final training set consists of 200 image patches of size 96×96 pixels, containing various defect types like porosity and inclusions. This limited dataset size intentionally simulates the real-world data-scarce scenario. The model was implemented using PyTorch and trained on an NVIDIA RTX 3060 GPU. The Adam optimizer was used with a learning rate of 0.001 for the generator and 0.0001 for the discriminator, a batch size of 8, for 500 epochs.
Evaluation Metric
Beyond visual inspection, the Fréchet Inception Distance (FID) is employed as the primary quantitative metric. FID measures the similarity between the distributions of real and generated images by comparing statistics derived from a pre-trained Inception-v3 network. A lower FID score indicates higher similarity and better generative performance. It is computed as:
$$ \text{FID}(\mathbf{x}, \mathbf{g}) = \|\mu_x – \mu_g\|^2 + \text{Tr}\left( \Sigma_x + \Sigma_g – 2(\Sigma_x \Sigma_g)^{1/2} \right), $$
where \(\mathbf{x}\) and \(\mathbf{g}\) are real and generated feature sets, and \((\mu_x, \Sigma_x)\) and \((\mu_g, \Sigma_g)\) are their mean and covariance matrices, respectively.
Results and Analysis
The performance of the proposed AMDCGAN is compared against the baseline DCGAN and analyzed through ablation studies.
1. Qualitative Generation Results
The visual progression of generated images across training epochs reveals the superiority of AMDCGAN. At early epochs (e.g., 100), both models produce blurry shapes with only rudimentary structure. As training progresses, the baseline DCGAN becomes unstable, often generating noisy or repetitive patterns (mode collapse). In contrast, AMDCGAN demonstrates stable convergence, with generated images gradually developing clearer, more defined defect-like features. By epoch 500, AMDCGAN synthesizes images with realistic defect textures and shapes that closely resemble authentic casting part surface flaws, whereas DCGAN outputs remain less coherent and contain more artifacts.
2. Quantitative Performance Comparison
The quantitative evaluation solidifies the visual observations. The FID score is calculated between 2000 generated images and the entire real training set. The results are summarized below:
| Model | FID Score (Lower is Better) | Relative Improvement |
|---|---|---|
| Baseline DCGAN | 19.07 | – |
| Proposed AMDCGAN | 7.54 | ≈ 60.5% Reduction |
The dramatic reduction in FID from 19.07 to 7.54 confirms that the distribution of images generated by AMDCGAN is significantly closer to the real casting part defect data distribution than those from the standard DCGAN.
3. Ablation Study
To dissect the contribution of each proposed component, a systematic ablation study was performed. Starting from the DCGAN baseline, modules were incrementally added.
| ECA Module | Meta-AconC | EMA Module | FID Score | Observation |
|---|---|---|---|---|
| ✗ | ✗ | ✗ | 19.07 | Baseline: Unstable training, poor detail. |
| ✓ | ✗ | ✗ | 17.29 | Improved feature focus, but instability remains. |
| ✓ | ✓ | ✗ | 10.18 | Stable training, better details, some noise. |
| ✓ | ✓ | ✓ | 7.54 | Best: High-fidelity, detailed, and clean outputs. |
The results clearly demonstrate the cumulative benefit of each innovation:
- Adding ECA to the generator (FID: 17.29) provides an initial boost by enhancing channel-wise feature awareness for defect synthesis.
- Replacing ReLU with Meta-AconC (FID: 10.18) yields the most significant single improvement, underscoring the critical role of adaptive activation in stabilizing GAN training for complex data like casting part defects.
- Integrating the EMA module into the discriminator (FID: 7.54) further refines the output quality, as the enhanced discriminator guides the generator to produce images with more accurate local and global structures.
Each component is validated as essential for achieving the final high-performance model.
Conclusion and Industrial Implications
This work presents AMDCGAN, a robust and effective deep learning framework designed to address the critical challenge of data scarcity in automated visual inspection of casting part surfaces. By integrating the ECA attention mechanism for focused feature generation, the Meta-AconC activation function for adaptive and stable learning dynamics, and the EMA attention mechanism for powerful discriminative analysis, the proposed model significantly outperforms the standard DCGAN baseline.
The quantitative results, evidenced by a 60.5% reduction in FID score, and qualitative assessments confirm that AMDCGAN can synthesize high-fidelity, diverse images of surface defects that are virtually indistinguishable from real casting part anomalies. The successful ablation study provides clear insights into the contribution of each architectural modification.
The primary industrial implication is substantial: inspection systems for casting part quality control can be fortified using synthetic data generated by AMDCGAN. By augmenting limited real-world datasets with a vast, controlled universe of synthetic defects, manufacturers can train more accurate, robust, and generalizable deep learning classifiers and detectors. This leads to reduced reliance on physical defect samples, lower costs associated with data collection, and ultimately, higher product quality and reliability. Future work will focus on extending this framework to generate defect images at higher resolutions and exploring conditional generation for specific, user-defined defect types on varied casting part geometries.
