Casting Surface Defect Detection Using an Enhanced YOLOv5-Based Model

In modern manufacturing, the quality inspection of casting parts is crucial, as surface defects such as cracks, pores, scratches, and inclusions can compromise product integrity, reduce lifespan, and even pose safety risks. Traditional inspection methods rely heavily on manual visual assessment by skilled personnel, which is subjective, time-consuming, and prone to errors like missed detections or false alarms. To address these limitations, automated defect detection systems leveraging deep learning have gained prominence. In this work, I propose an improved YOLOv5 algorithm, termed CCD-YOLOv5, specifically designed for casting surface defect detection. The model integrates advanced data augmentation techniques, architectural modifications, and attention mechanisms to enhance accuracy and efficiency. Throughout this article, the term ‘casting part’ will be emphasized to underscore the focus on industrial applications.

The core challenge in casting part defect detection lies in the diversity of defect types, subtle target features, and varying environmental conditions. For instance, defects like sand holes, shrinkage cavities, overlaps, damage, cracks, pores, and sticking manifest differently in casting parts, requiring robust feature extraction. My approach builds upon the YOLOv5 framework, incorporating data preprocessing, module replacements, and attention mechanisms to improve performance. Below, I detail the methodology, experimental setup, and results, supported by formulas and tables for clarity.

Data preprocessing is critical for training effective deep learning models, especially when dealing with limited datasets common in casting part inspection. I augmented the original dataset through geometric and photometric transformations to simulate real-world variations. Specifically, I applied horizontal flipping, vertical flipping, diagonal flipping, random rotations (e.g., ±30 degrees), and color adjustments including brightness, contrast, hue, and saturation changes. These operations increase sample diversity and help the model generalize to unseen casting part defects. For example, flipping can mimic different viewing angles of a casting part, while color adjustments account for lighting variations in production environments.

To further enrich the dataset, I employed 9-Mosaic data augmentation, an extension of the traditional 4-Mosaic technique. This method randomly selects nine images, resizes them, and stitches them into a single composite image. The process enhances sample variety and increases the number of small targets, which is beneficial for detecting minor defects in casting parts. Mathematically, given input images $X_1, X_2, \ldots, X_9$, each with dimensions $H_i \times W_i$, the composite image $Y$ is generated by scaling each $X_i$ to a proportion of the target size $S \times S$ and arranging them in a grid. The labels are adjusted accordingly to maintain spatial correspondence. This augmentation reduces the model’s reliance on specific contextual patterns and improves robustness for casting part inspection.

The improved CCD-YOLOv5 model architecture modifies the native YOLOv5s framework to enhance feature extraction and detection efficiency. I replaced the CSPDarknet53 backbone module with the C2f module, which combines ideas from C3 and ELAN structures. The C2f module enriches gradient information flow by integrating shallow and deep features, leading to better accuracy without significant computational overhead. For a feature map $F \in \mathbb{R}^{C \times H \times W}$, the C2f operation can be expressed as:

$$F_{\text{out}} = \text{Concat}(\text{Bottleneck}(F), \text{Skip}(F))$$

where $\text{Bottleneck}$ denotes a series of convolutions and activations, and $\text{Skip}$ represents a residual connection. This design mitigates redundancy and preserves essential details for casting part defects.

Additionally, I incorporated the Coordinate Attention (CA) mechanism to enhance feature representation. CA captures spatial dependencies by decomposing channel attention into horizontal and vertical directions, allowing the model to focus on defect regions in casting parts. Given an input feature map $X \in \mathbb{R}^{C \times H \times W}$, the CA mechanism computes attention weights as follows. First, average pooling is applied along the height and width dimensions to generate two directional feature maps:

$$u_h(h) = \frac{1}{W} \sum_{i=1}^{W} X_c(i, h), \quad u_w(w) = \frac{1}{H} \sum_{j=1}^{H} X_c(w, j)$$

These are concatenated and processed through a convolution and activation function:

$$f = \delta(F([u_h, u_w]))$$

where $\delta$ is the ReLU function and $F$ denotes a convolutional layer. The attention weights for height and width are obtained via sigmoid activations:

$$g_h = \sigma(F_h(f_h)), \quad g_w = \sigma(F_w(f_w))$$

The final output feature map $Y$ is computed by element-wise multiplication:

$$Y_c(i, j) = X_c(i, j) \cdot g_h(i) \cdot g_w(j)$$

This mechanism improves the model’s ability to localize defects in casting parts by emphasizing relevant spatial regions.

Furthermore, I substituted the coupled head in YOLOv5 with a decoupled head structure, which separates classification and regression tasks. This separation accelerates convergence and enhances precision for casting part defect detection. The decoupled head consists of two parallel branches: one for classification (predicting defect categories) and another for regression (predicting bounding box coordinates). Each branch includes convolutional layers and activation functions, with an additional IoU branch for regression refinement. The loss function combines classification loss $\mathcal{L}_{\text{cls}}$ and regression loss $\mathcal{L}_{\text{reg}}$:

$$\mathcal{L} = \lambda_1 \mathcal{L}_{\text{cls}} + \lambda_2 \mathcal{L}_{\text{reg}}$$

where $\lambda_1$ and $\lambda_2$ are balancing weights. This design reduces spatial misalignment issues common in coupled approaches.

To evaluate the model, I conducted experiments on a dataset of casting part images, which included seven defect types: sand hole, shrinkage, overlap, damage, crack, pore, and sticking. After augmentation, the training set comprised 1,863 images. I used a cross-validation strategy to ensure robustness. The experimental environment was configured with an Intel Core i5 CPU, 16GB RAM, and an NVIDIA GeForce RTX 3070 GPU, running Python 3.9 and PyTorch 2.1. Training parameters included a batch size of 16, initial learning rate of 0.0003, weight decay of 0.001, and up to 500 epochs with early stopping based on validation loss.

The performance metrics were accuracy (P), recall (R), and mean average precision (mAP) at an IoU threshold of 0.5. These are defined as:

$$P = \frac{TP}{TP + FP}, \quad R = \frac{TP}{TP + FN}, \quad \text{mAP} = \frac{1}{N} \sum_{i=1}^{N} AP_i$$

where $TP$, $FP$, and $FN$ denote true positives, false positives, and false negatives, respectively, and $AP_i$ is the average precision for class $i$. For casting part defects, higher mAP values indicate better overall detection capability.

I performed ablation studies to assess the contribution of each modification. The results are summarized in the following tables, which compare the native YOLOv5s with variants incorporating individual and combined improvements. Each metric is reported as a percentage, averaged across defect types.

Table 1: Comparison of Precision (P) for Different Model Configurations
Model	Average P (%)	Sand Hole (%)	Shrinkage (%)	Overlap (%)	Damage (%)	Crack (%)	Pore (%)	Sticking (%)
YOLOv5s	78.2	73.5	71.1	66.6	89.5	85.0	86.1	75.7
C2f-YOLOv5	80.2	72.9	78.4	69.6	88.9	86.2	87.6	78.1
CA-YOLOv5	79.3	72.3	75.9	67.3	88.1	89.4	85.0	76.9
Decoupled Head-YOLOv5	82.0	75.6	77.0	73.2	90.1	88.6	90.3	79.3
CCD-YOLOv5	82.9	77.6	76.7	71.4	93.7	89.9	90.0	80.8

Table 1 shows that the CCD-YOLOv5 model achieves the highest average precision of 82.9%, indicating improved accuracy in identifying casting part defects. The decoupled head contributes significantly to this gain, as seen in the Decoupled Head-YOLOv5 results. Defects like damage and pores are detected with high precision, while overlap and sand hole remain challenging due to their subtle features in casting parts.

Table 2: Comparison of Recall (R) for Different Model Configurations
Model	Average R (%)	Sand Hole (%)	Shrinkage (%)	Overlap (%)	Damage (%)	Crack (%)	Pore (%)	Sticking (%)
YOLOv5s	74.3	73.4	72.3	61.1	80.1	79.1	87.3	66.7
C2f-YOLOv5	75.5	75.3	71.8	67.2	80.4	77.2	89.1	67.8
CA-YOLOv5	75.8	72.2	73.2	67.0	79.9	80.2	90.4	67.9
Decoupled Head-YOLOv5	75.9	72.8	75.3	64.9	79.5	82.3	86.3	70.5
CCD-YOLOv5	76.7	74.2	76.4	63.7	83.1	81.2	89.0	69.4

Table 2 demonstrates that CCD-YOLOv5 achieves the highest average recall of 76.7%, meaning it better captures true defects in casting parts. The C2f module enhances recall by improving gradient flow, as evidenced in the C2f-YOLOv5 results. Defects like pores and cracks have high recall, while overlap and sticking show lower values due to their irregular shapes in casting parts.

Table 3: Comparison of mAP@0.5 for Different Model Configurations
Model	Average mAP (%)	Sand Hole (%)	Shrinkage (%)	Overlap (%)	Damage (%)	Crack (%)	Pore (%)	Sticking (%)
YOLOv5s	76.4	74.2	70.3	65.4	83.4	84.8	83.5	73.2
C2f-YOLOv5	78.3	75.3	71.2	64.8	87.1	82.4	90.9	76.3
CA-YOLOv5	79.0	76.3	78.2	65.9	85.9	84.5	87.6	74.8
Decoupled Head-YOLOv5	80.4	78.2	74.9	68.1	88.0	89.2	87.8	76.6
CCD-YOLOv5	81.0	77.4	74.0	69.5	89.1	88.2	93.3	75.9

Table 3 reveals that CCD-YOLOv5 attains the highest average mAP of 81.0%, signifying superior overall detection performance for casting part defects. The CA attention mechanism contributes to this improvement by enhancing feature focus, as seen in CA-YOLOv5. Defects like pores and damage achieve high mAP, while overlap and shrinkage present challenges due to their variability in casting parts.

To further analyze the model’s effectiveness, I computed the F1-score, which balances precision and recall:

$$F1 = 2 \cdot \frac{P \cdot R}{P + R}$$

For CCD-YOLOv5, the average F1-score is 79.7%, compared to 76.2% for YOLOv5s. This indicates a more balanced performance in detecting casting part defects. Additionally, I evaluated computational efficiency using floating-point operations (FLOPs) and model size. The CCD-YOLOv5 model has 18.2 GFLOPS and a size of 16.2 MB, slightly higher than YOLOv5s (17.8 GFLOPS, 16.8 MB) but lower than YOLOv8s (28.5 GFLOPS, 21.4 MB). This makes CCD-YOLOv5 suitable for real-time applications in casting part inspection lines.

The training process involved monitoring loss curves to ensure convergence. The total loss $\mathcal{L}_{\text{total}}$ is a combination of classification loss $\mathcal{L}_{\text{cls}}$, regression loss $\mathcal{L}_{\text{reg}}$, and objectness loss $\mathcal{L}_{\text{obj}}$:

$$\mathcal{L}_{\text{total}} = \alpha \mathcal{L}_{\text{cls}} + \beta \mathcal{L}_{\text{reg}} + \gamma \mathcal{L}_{\text{obj}}$$

where $\alpha$, $\beta$, and $\gamma$ are weighting factors typically set to 1. With early stopping, training concluded after 350 epochs when validation loss plateaued. The learning rate decayed according to a cosine schedule:

$$\eta_t = \eta_{\text{min}} + \frac{1}{2}(\eta_{\text{max}} – \eta_{\text{min}})\left(1 + \cos\left(\frac{t}{T}\pi\right)\right)$$

where $\eta_t$ is the learning rate at epoch $t$, $\eta_{\text{min}}$ and $\eta_{\text{max}}$ are minimum and maximum rates, and $T$ is the total epochs. This strategy stabilized training and improved generalization for casting part defects.

In practice, the model processes input images of size 640×640 pixels, resized from original casting part photos. The defect detection pipeline involves pre-processing, inference, and post-processing steps. For a given casting part image $I$, the model outputs bounding boxes $B_i$ and class probabilities $p_i$ for each defect type. Non-maximum suppression (NMS) is applied to remove redundant detections based on IoU thresholding:

$$\text{IoU}(B_i, B_j) = \frac{\text{Area}(B_i \cap B_j)}{\text{Area}(B_i \cup B_j)}$$

Detections with IoU above 0.5 are merged. This ensures precise localization of defects in casting parts.

Despite the improvements, challenges remain in detecting small or subtle defects in casting parts, such as fine cracks or light sticking. Future work could involve integrating multi-scale feature pyramids or generative adversarial networks (GANs) for synthetic data generation. Additionally, transfer learning from related industrial datasets may enhance performance. The modular design of CCD-YOLOv5 allows for easy integration of such advancements.

In conclusion, the proposed CCD-YOLOv5 model effectively addresses casting surface defect detection by combining data augmentation, architectural innovations, and attention mechanisms. The improvements lead to higher accuracy, recall, and mAP compared to baseline models, making it a viable solution for quality control in casting part manufacturing. The emphasis on ‘casting part’ throughout this work highlights its industrial relevance. By leveraging deep learning, automated inspection systems can reduce human error, increase throughput, and ensure consistent quality for casting parts in diverse applications.