Improved YOLOv8 for Casting Part Defect Detection

In the field of industrial manufacturing, the quality inspection of casting parts is a critical step to ensure product reliability and longevity. As international trade expands, particularly after China’s accession to the WTO, the demand for high-quality casting parts has surged. However, casting parts often suffer from various defects during processing and usage, with metallic projections (such as burrs, sand inclusions, and excess material) being among the most prevalent. These defects can compromise the structural integrity and performance of casting parts, leading to increased waste and safety concerns. Traditional manual inspection methods are labor-intensive, time-consuming, and prone to human error, especially in complex industrial environments. Therefore, automated defect detection systems based on computer vision have gained significant attention. In recent years, research in this area has evolved from traditional machine learning approaches to deep learning-based methods, with the latter showing superior performance in handling complex and variable defect patterns. In this paper, we propose an enhanced YOLOv8 model, dubbed YOLOv8-LQ, specifically designed for detecting metallic projection defects in casting parts. Our improvements focus on addressing challenges such as limited datasets, difficult detection in complex environments, and low efficiency. We incorporate advanced techniques including data augmentation, backbone network replacement, neck reconstruction, attention mechanisms, and optimized loss functions to boost the model’s robustness and accuracy. Through extensive experiments, we demonstrate that our model achieves higher mean average precision (mAP) and better detection rates compared to baseline YOLOv8 and other variants, making it suitable for real-world casting part inspection scenarios.

The inspection of casting parts is essential for maintaining high standards in industries like automotive, aerospace, and machinery. Defects in casting parts, such as metallic projections, can arise from factors like mold wear, improper pouring, or material inconsistencies. These defects are often small, irregularly shaped, and embedded in complex backgrounds, making them difficult to detect with conventional methods. Traditional computer vision techniques rely on handcrafted features and assumptions about data distribution, which may not generalize well to diverse casting part defects. For instance, methods based on edge detection or template matching can fail when defects vary in size, orientation, or texture. With the advent of deep learning, convolutional neural networks (CNNs) have revolutionized defect detection by learning hierarchical features directly from data. Models like YOLO (You Only Look Once) have become popular due to their speed and accuracy in real-time object detection. However, applying standard YOLO models to casting part defect detection poses challenges, including limited annotated datasets, class imbalance, and the need for precise localization of small defects. To overcome these issues, we present a comprehensive改进 of the YOLOv8 architecture, integrating several novel components that enhance feature extraction, multi-scale fusion, and bounding box regression. Our work contributes to the field by providing a tailored solution for metallic projection detection, which can be extended to other defect types in casting parts.

Our proposed YOLOv8-LQ model builds upon the baseline YOLOv8, which is known for its efficient balance between speed and accuracy. The key improvements include: replacing the original CSPDarknet53 backbone with a ResNet-based architecture to improve gradient flow and feature representation; reconstructing the neck network using a lightweight block called C2f-Star to reduce parameters while maintaining feature fusion capabilities; incorporating a Dynamic Head module with multi-head self-attention to enhance scale, spatial, and task-aware feature learning; and adopting the MPDIoU loss function for better bounding box regression. These modifications are designed to address the specific characteristics of casting part defects, such as their small size and subtle appearance. In the following sections, we detail each component, present experimental results on a custom dataset, and discuss the implications for industrial applications. We also provide mathematical formulations and tables to summarize our approach and findings, ensuring clarity and reproducibility. Throughout this paper, we emphasize the importance of casting part quality and how our model can aid in automated inspection systems, ultimately reducing costs and improving product reliability.

Before delving into the methodology, it is worthwhile to review related work in casting part defect detection. Early approaches used traditional image processing techniques, such as morphological operations, thresholding, and texture analysis. For example, some methods applied wavelet transforms to highlight defect regions in casting parts, but they often required careful parameter tuning and struggled with noise. Machine learning algorithms, including support vector machines (SVMs) and random forests, were later employed with handcrafted features like Histogram of Oriented Gradients (HOG) or Local Binary Patterns (LBP). While these methods showed promise, they were limited by their dependency on feature engineering and lack of adaptability to new defect patterns. The rise of deep learning has shifted the paradigm, with CNNs enabling end-to-end learning from raw images. In particular, object detection networks like Faster R-CNN, SSD, and YOLO have been adapted for defect detection. For casting parts, prior studies have explored variants of YOLOv3 and YOLOv4 for detecting scratches, pores, and other imperfections. However, metallic projection defects remain understudied due to their complexity and scarcity of data. Our work extends this line of research by focusing on YOLOv8, the latest iteration in the YOLO series, and enhancing it with state-of-the-art techniques to tackle metallic projections in casting parts.

The core of our approach lies in the architectural modifications to YOLOv8. We first describe the baseline YOLOv8 model, which consists of a backbone for feature extraction, a neck for multi-scale feature fusion, and a head for detection. The backbone uses CSPDarknet53, incorporating Cross Stage Partial connections to reduce computational cost. The neck employs Path Aggregation Network (PAN) and Feature Pyramid Network (FPN) structures to combine features from different levels. The detection head outputs bounding boxes, objectness scores, and class probabilities. While effective, this baseline may not fully capture the nuances of casting part defects. Hence, we introduce the following improvements.

Backbone Replacement with ResNet: We replace the CSPDarknet53 backbone with a ResNet-50 architecture. ResNet introduces residual blocks that mitigate the vanishing gradient problem in deep networks, allowing for more effective training and richer feature extraction. Each residual block contains two convolutional layers with batch normalization and ReLU activation, followed by a skip connection that adds the input to the output. This can be formulated as:

$$ y = \mathcal{F}(x, {W_i}) + x $$

where $ x $ is the input, $ \mathcal{F} $ represents the residual function (e.g., two convolutions), and $ y $ is the output. By stacking such blocks, ResNet can learn complex features relevant to casting part defects, such as edges, textures, and shapes. We choose ResNet-50 for its balance between depth and efficiency, ensuring that the model remains suitable for real-time applications while improving feature representation for small defects in casting parts.

Neck Reconstruction with C2f-Star: The neck network in YOLOv8 uses C2f modules, which involve multiple bottleneck blocks and split operations. To reduce parameter count and computational overhead, we redesign this as C2f-Star. In C2f-Star, we simplify the tensor combination modes to only summation and element-wise multiplication. The element-wise multiplication for two vectors $ w_1 $ and $ w_2 $ of dimension $ d+1 $ can be expressed as:

$$ w_1^T x * w_2^T x = \left( \sum_{i=1}^{d+1} w_{1i} x_i \right) * \left( \sum_{j=1}^{d+1} w_{2j} x_j \right) = \sum_{i=1}^{d+1} \sum_{j=1}^{d+1} w_{1i} w_{2j} x_i x_j $$

This operation implicitly projects the input into a higher-dimensional space, enabling nonlinear feature combinations without adding excessive parameters. By using C2f-Star in the neck, we maintain effective feature fusion from different scales while keeping the model lightweight, which is crucial for deploying on resource-constrained devices in casting part factories.

Dynamic Head with Attention Mechanisms: We integrate a Dynamic Head module at the output end of the network to enhance the detection head’s ability to perceive and express objects. The Dynamic Head employs three types of attention: scale-aware attention ($ \pi_L $), spatial-aware attention ($ \pi_S $), and task-aware attention ($ \pi_C $). These are applied sequentially to the feature maps. For scale-aware attention, we use a deformable convolution to learn sparse sampling locations across feature levels. The attention weights are computed as:

$$ \pi_L(F) = \sigma(f_L(F)) \odot F $$

where $ F $ is the input feature, $ f_L $ is a linear transformation, $ \sigma $ is the sigmoid function, and $ \odot $ denotes element-wise multiplication. Similarly, spatial-aware attention focuses on important regions, and task-aware attention modulates channels for different detection tasks. By combining these attentions, the Dynamic Head improves the model’s robustness to scale variations, occlusions, and complex backgrounds commonly encountered in casting part images.

MPDIoU Loss Function: For bounding box regression, we adopt the MPDIoU loss, which considers both overlap and non-overlap areas, center point distances, and width-height deviations. Given a predicted box $ B $ and a ground truth box $ A $, the MPDIoU is defined as:

$$ \text{MPDIoU} = \frac{A \cap B}{A \cup B} – \frac{d_1^2}{w^2 + h^2} – \frac{d_2^2}{w^2 + h^2} $$

where $ d_1 $ and $ d_2 $ are the distances between the top-left and bottom-right corners of the two boxes, respectively, and $ w $ and $ h $ are the width and height of the image. The loss function is then $ \mathcal{L}_{\text{MPDIoU}} = 1 – \text{MPDIoU} $. This loss simplifies computation and directly optimizes the bounding box parameters, leading to more accurate localization of defects in casting parts.

To summarize our model architecture, we provide a comparison of components in Table 1.

Component	Baseline YOLOv8	YOLOv8-LQ (Our Model)
Backbone	CSPDarknet53	ResNet-50
Neck	C2f with bottlenecks	C2f-Star with summation and multiplication
Detection Head	Standard head	Dynamic Head with attention
Loss Function	CIoU	MPDIoU

We now turn to the experimental setup. Our dataset consists of 540 original images of casting parts with metallic projection defects, categorized into three classes: Burr, Sand, and Bean. To address data scarcity, we apply extensive data augmentation techniques, including adaptive histogram equalization, horizontal flipping, random contrast adjustment, random masking, and random translation. This expands the dataset to 3,240 images. We split the data into training, validation, and test sets in an 8:1:1 ratio. Table 2 shows the distribution of images after augmentation.

Class	Original Images	Augmented Images (Total)	Training Set	Validation Set	Test Set
Burr	180	1,080	864	108	108
Sand	180	1,080	864	108	108
Bean	180	1,080	864	108	108
Total	540	3,240	2,592	324	324

The experiments are conducted on a Windows system with an RTX 4090 GPU (24 GB) and an Intel Xeon Platinum 8375C CPU. We set the input image size to 640×640 pixels, batch size to 16, epochs to 100, and workers to 8. The optimizer is SGD with an initial learning rate of 0.01, momentum of 0.937, and weight decay of 0.0005. We disable mosaic augmentation during training to ensure stability. Evaluation metrics include precision (P), recall (R), mean average precision (mAP), true detection rate (TD), false detection rate (FD), and omit detection rate (OD). Precision and recall are defined as:

$$ P = \frac{TP}{TP + FP}, \quad R = \frac{TP}{TP + FN} $$

where TP, FP, and FN are true positives, false positives, and false negatives, respectively. mAP is computed as the average of AP across all classes at an IoU threshold of 0.5. TD, FD, and OD are derived from test set predictions to assess practical detection performance.

We compare our YOLOv8-LQ model with several variants: baseline YOLOv8, YOLOv8-Ghost (which uses Ghost modules for efficiency), YOLOv8-RevCol (with reversible columns), and YOLOv8-SlimNeck (with a slim neck design). The results are presented in Table 3.

Model	mAP (%)	Precision (%)	Recall (%)	TD (%)	FD (%)	OD (%)
Baseline YOLOv8	79.7	98.8	97.8	78.8	14.8	6.4
YOLOv8-Ghost	74.2	98.0	96.0	84.8	13.6	1.6
YOLOv8-RevCol	72.0	97.2	95.9	87.2	12.4	0.4
YOLOv8-SlimNeck	75.8	97.1	97.9	88.4	11.2	0.4
YOLOv8-LQ (Ours)	81.9	98.7	98.3	94.9	5.0	0.1

Our model achieves the highest mAP of 81.9%, outperforming the baseline by 2.2%. Moreover, the true detection rate (TD) reaches 94.9%, which is 16.1% higher than the baseline, indicating that our model can correctly identify more defects in casting parts. The false detection rate (FD) drops to 5.0%, showing reduced misclassifications. These improvements demonstrate the effectiveness of our architectural changes for metallic projection detection.

To further validate each component, we conduct ablation studies, as shown in Table 4. We start with the baseline and incrementally add improvements: ResNet backbone, C2f-Star neck, Dynamic Head, and MPDIoU loss. The results confirm that each contribution enhances performance.

Method	ResNet	C2f-Star	Dynamic Head	MPDIoU	mAP (%)	FLOPs (×10^9)	Params (×10^6)
Expt1 (Baseline)	×	×	×	×	79.7	8.9	3.15
Expt2	√	×	×	×	80.0	8.2	3.08
Expt3	√	√	×	×	81.7	8.6	2.94
Expt4	√	√	√	×	82.0	8.1	3.02
Expt5 (Full)	√	√	√	√	81.9	8.2	2.92

In Expt5, our full model achieves an mAP of 81.9% with reduced FLOPs and parameters compared to the baseline, highlighting its efficiency. The MPDIoU loss slightly adjusts the mAP but improves bounding box accuracy, as evidenced by the higher TD and lower FD. We also analyze detection results visually. For instance, in images with multiple small defects, such as sand inclusions or burrs, our model produces bounding boxes with higher confidence scores (e.g., 0.95 vs. 0.83 in baseline) and better coverage. This is crucial for casting part inspection where missing even a small defect can lead to product failure.

The success of our model can be attributed to several factors. First, the ResNet backbone provides deeper feature extraction without gradient issues, capturing fine details of casting part defects. Second, the C2f-Star neck reduces computational cost while preserving multi-scale information, which is vital for detecting defects of varying sizes. Third, the Dynamic Head enhances focus on relevant regions and scales, making the model robust to background clutter. Fourth, the MPDIoU loss optimizes bounding box alignment, improving localization precision. Together, these elements make YOLOv8-LQ a powerful tool for automated defect detection in casting parts.

However, there are limitations. Our dataset, though augmented, may not cover all possible variations in casting part defects, such as those from different materials or manufacturing processes. Future work could involve collecting more diverse data or using synthetic data generation techniques. Additionally, the model’s performance in real-time video streams needs to be evaluated for integration into production lines. We also plan to explore lightweight versions for edge deployment and extend the approach to other defect types like cracks or porosity in casting parts.

In conclusion, we have presented an improved YOLOv8 model for detecting metallic projection defects in casting parts. By incorporating ResNet, C2f-Star, Dynamic Head, and MPDIoU loss, our model achieves higher accuracy and efficiency compared to existing methods. Experimental results on a custom dataset show significant gains in mAP and true detection rate, demonstrating its potential for industrial quality control. As casting parts continue to be integral to various industries, automated inspection systems like ours can enhance productivity, reduce waste, and ensure safety. We hope this work inspires further research in deep learning-based defect detection for manufacturing applications.

Throughout this paper, we have emphasized the importance of casting part quality and how advanced computer vision techniques can address detection challenges. Our model is a step toward more reliable and efficient inspection systems, contributing to the broader goal of smart manufacturing. We encourage practitioners to adopt such methods and adapt them to their specific needs, ultimately driving innovation in the casting industry.