Small Target Defect Detection in Casting Part DR Images Based on YOLOv8

In modern manufacturing, the demand for casting parts has surged, accompanied by increasing requirements for product quality. To enhance the durability of casting parts and reduce the frequency of accidents, it is essential to detect internal defects in these components. During the production of casting parts, various defects such as gas pores, inclusions, and shrinkage porosity can arise due to factors like casting工艺, part structure, and raw material quality. Traditional defect detection methods often rely on manual inspection, which is inefficient and prone to missing small defects, leading to漏检. With the rapid advancement of artificial intelligence, machine vision technology has been widely applied in target detection fields. Machine vision can replace human labor by analyzing and processing images, enabling high-precision, high-speed detection and reducing costs. Machine vision typically falls into two categories: traditional machine vision methods and deep learning-based methods. Traditional methods require designing specific algorithms for different features, resulting in low stability and development efficiency, whereas deep learning-based techniques learn features from samples through neural networks, offering stronger robustness.

Target detection, a deep learning-based defect detection technology, has been extensively used in X-ray image defect detection. For instance, some studies have proposed methods combining laser ultrasound with CNN for defect detection in casting parts. Others have introduced guided filtering technology to enhance defects in DR images and improved the YOLOv3 network structure to increase the accuracy of detecting微小缺陷 in casting parts. Additionally, an efficient detection method for casting part defects was proposed, replacing the backbone network of the original YOLOv4 algorithm with EfficientNet to save computational resources while maintaining accuracy. Further improvements to the YOLOv5 algorithm have enhanced feature extraction, boosting the detection capability for defects in aluminum alloy wheel casting parts. Currently, deep learning-based target detection algorithms are generally divided into two types: regression-based single-stage algorithms and candidate region-based two-stage algorithms. Single-stage algorithms offer faster detection speeds while ensuring accuracy, providing greater practical application value. The YOLO series, as a representative single-stage algorithm, has evolved rapidly with excellent detection performance, attracting significant attention. Therefore, this study aims to enhance the detection of small target defects in casting part DR images by optimizing image data and improving the YOLOv8 algorithm through卷积方式替换, introducing attention mechanisms, and adding small target detection layers.

Image Data Preprocessing

The experimental data consists of 16-bit depth grayscale images, where defects are often subtle and difficult for the human eye to discern directly. Thus, image enhancement is necessary to make defects more visible. Additionally, since grayscale images have only one effective channel, while deep learning networks typically read images in three channels, we augment the grayscale images by adding effective channels based on the original image to enrich the information.

Image Enhancement

The study focuses on 16-bit depth casting part DR images, which have a wide pixel range, but defects often occupy only a small pixel range, making them hard to detect and localize directly. Typically, adjusting the window width and window level is used to enhance local contrast and improve defect visibility. Window width refers to the range of pixel values displayed in the image, while window level represents the center value of the maximum and minimum pixel values displayed. However, under different window widths and levels, the detailed information displayed in the entire image varies, making it challenging for frontline workers to identify all defects under a single setting, and the window level for defects may differ across casting part images.

In digital image processing, histogram equalization is a common image enhancement technique that stretches the gray values of an image to enhance contrast and improve visual quality. However, the image data used in this study belongs to 16-bit depth industrial DR images, characterized by small defect targets and low contrast, making defects difficult to observe. Ordinary histogram equalization may not effectively highlight image details to reveal defect information. Therefore, we adopt the Contrast Limited Adaptive Histogram Equalization (CLAHE) algorithm to enhance the original images.

CLAHE is a technique used to enhance image details by dividing the image into several equally sized sub-images, performing histogram equalization on each sub-image, and then interpolating the locally equalized images to eliminate brightness jumps between small regions, stitching them into a complete image. The steps of the CLAHE algorithm are as follows:

First, the image is divided into sub-images of equal size, and histogram equalization is applied to each sub-image. The gray histogram is denoted as $ h(i) $, and the clipping amplitude $ T $ is calculated as:

$$ T = \frac{C_{\text{clip}} \cdot N_x \cdot N_y}{M} $$

where $ C_{\text{clip}} $ is the clipping coefficient, $ N_x $ and $ N_y $ are the width and height in pixels of each sub-image, and $ M $ is the number of gray levels in the corresponding sub-image.

Based on the calculated clipping amplitude $ T $, each histogram $ h(i) $ is clipped. The number of pixels $ N $ exceeding the amplitude is evenly distributed across each gray level, with each gray level receiving $ N_{\text{ave}} $ pixels:

$$ N = \sum_{i=0}^{M-1} \max[h(i) – T, 0] $$
$$ N_{\text{ave}} = \frac{N}{M} $$

The redistributed histogram $ H(i) $ is obtained as:

$$ H(i) = \begin{cases} T + N_{\text{ave}} & \text{if } h(i) \geq T \\ h(i) + N_{\text{ave}} & \text{if } h(i) < T \end{cases} $$

Finally, the locally equalized images are interpolated to eliminate brightness jumps between regions and stitched into the完整图像. This process effectively enhances small target defect details, making target information more visible and easier for human observation.

Channel Expansion

The casting part images in this study are grayscale, meaning their gray matrix has only one channel, unlike color images which have three channels. In the YOLOv8 model, the algorithm typically reads images in three channels, so grayscale images are expanded into three identical gray matrices. We consider the two additional channels identical to the original gray matrix as redundant information. Compared to color images, grayscale images waste the information from two channels. Given that casting part defects have different contour features, we apply the Sobel operator to the original grayscale image to compute the gray gradient changes in the X and Y directions, respectively. The results are then fused with the original grayscale image to expand it into a three-channel image. This transformation turns the single-channel grayscale image into an RGB color image fused with edge features, expanding the image dimensionality and providing more learnable features for the computer compared to the original grayscale image.

The Sobel operator is used for edge detection by convolving the image with small, separable, and integer-valued filters. The gradients in the X and Y directions are calculated as:

$$ G_x = \begin{bmatrix} -1 & 0 & +1 \\ -2 & 0 & +2 \\ -1 & 0 & +1 \end{bmatrix} * I $$
$$ G_y = \begin{bmatrix} -1 & -2 & -1 \\ 0 & 0 & 0 \\ +1 & +2 & +1 \end{bmatrix} * I $$

where $ I $ is the original grayscale image, and $ * $ denotes the convolution operation. The magnitude of the gradient is often computed as $ G = \sqrt{G_x^2 + G_y^2} $, but for channel expansion, we use $ G_x $ and $ G_y $ as additional channels. Thus, the expanded image $ I_{\text{expanded}} $ has three channels: the original grayscale channel $ I $, the X-direction gradient channel $ G_x $, and the Y-direction gradient channel $ G_y $. This enriches the feature representation for subsequent deep learning processing.

YOLOv8 Principles and Improvements

The YOLOv8 algorithm builds upon YOLOv5 with enhancements in the backbone network, feature enhancement extraction部分, and prediction head. It offers faster training speeds and improved detection performance. In the backbone network, YOLOv8 reduces the kernel size of the first convolution to 3×3 and replaces all C3 modules with C2f modules, increasing skip connections to fuse features from different layers, allowing YOLOv8 to capture richer gradient flow information. In the feature enhancement extraction部分, a bidirectional pathway PANet structure is adopted, shortening the information path between top and bottom layers, making it easier for top layers to obtain bottom-layer features. In the prediction head, YOLOv8 uses a novel decoupled head structure, separating classification and regression tasks, and switches from Anchors-Based to Anchors-Free to improve model recognition capabilities.

However, due to the small target size and varied shapes of the detection targets in casting part defects, some defects may not be easily detected. To make the algorithm more suitable for casting part defect detection, we have made improvements in three aspects: convolution method, attention mechanism, and detection head.

Dilated Convolution

In the YOLOv8 network structure, the first layer uses a 3×3 convolution kernel, which has a small receptive field and limited ability to distinguish small targets. Larger convolution kernels can provide a larger receptive field but may ignore details, reducing the network’s discrimination ability. Therefore, we introduce dilated convolution (also known as atrous convolution) to replace the first convolution layer. Dilated convolution injects holes into standard convolution, enlarging the local receptive field without losing internal data information. For a dilation rate of $ r $, the effective kernel size becomes $ k + (k-1)(r-1) $, where $ k $ is the original kernel size. For example, with a kernel size of 3 and dilation rate of 2, the receptive field expands to 5×5 without increasing parameters. This enhances feature extraction for small target defects in casting part images.

The dilated convolution operation can be expressed as:

$$ y[i] = \sum_{k} x[i + r \cdot k] \cdot w[k] $$

where $ x $ is the input, $ w $ is the kernel, $ r $ is the dilation rate, and $ y $ is the output. This allows the network to capture broader context while maintaining resolution, beneficial for detecting small defects in casting parts.

SimAM Attention Mechanism

To further improve model detection performance, we introduce an attention mechanism. Attention mechanisms optimize model detection by helping the model focus on important information in the input, enhancing effective features while suppressing redundant or useless information, thereby highlighting key features. Given that the images are fused with gradient information, we add the SimAM attention mechanism to the network. SimAM is a 3D attention mechanism that combines spatial and channel information. It evaluates each neuron by defining an energy function $ e_i^* $ to determine the importance of each independent neuron:

$$ e_i^* = \frac{4(\hat{\sigma}^2 + \lambda)}{(t_i – \hat{\mu})^2 + 2\hat{\sigma}^2 + 2\lambda} $$

where $ \lambda $ is a regularization term, $ t_i $ is the $ i $-th neuron on a single channel of the input feature map, $ \hat{\mu} $ is the mean of all neurons on a single channel, and $ \hat{\sigma}^2 $ is the variance of all neurons on a single channel. The calculations for $ \hat{\mu} $ and $ \hat{\sigma}^2 $ are:

$$ \hat{\mu} = \frac{1}{M} \sum_{i=1}^{M} x_i $$
$$ \hat{\sigma}^2 = \frac{1}{M} \sum_{i=1}^{M} (x_i – \hat{\mu})^2 $$

where $ M $ is the number of neurons. A smaller $ e_i^* $ value indicates that the neuron is more distinct from surrounding neurons, more linearly separable, and thus more important. Therefore, we use $ 1/e_i^* $ as a weight coefficient and apply the sigmoid activation function to limit the value range, obtaining the new output feature map $ X’ $:

$$ X’ = \text{sigmoid}\left(\frac{1}{E}\right) \cdot X $$

where $ X $ is the input feature map, and $ E $ is the set of all $ e_i^* $ values for the neurons in the input feature map. The SimAM module enhances feature layers from the backbone network, allowing effective features to be more fully utilized, which is crucial for learning various target features in casting part defect detection.

Adding Small Target Detection Layer

In the YOLOv8 network, deeper layers have larger receptive fields, which can impair the recognition of small targets. To strengthen the model’s learning performance for small targets, we add a small target detection layer by incorporating larger-sized feature layers from the backbone into the neck network for feature learning. These features are enhanced via the SimAM module and then stacked with upsampled lower-layer features before passing through a C2f module to the detection layer. This modification increases the model’s ability to detect small defects in casting part images.

The improved network structure integrates these changes, as summarized in the following table detailing the modifications:

Module	Original YOLOv8	Improved YOLOv8	Purpose
First Convolution	3×3 standard convolution	3×3 dilated convolution (rate=2)	Increase receptive field for small targets
Attention Mechanism	None	SimAM added to backbone and neck	Enhance important features
Detection Layers	Three detection layers	Four detection layers (additional small target layer)	Improve small target detection accuracy
Channel Expansion	Automatic grayscale to 3-channel	Manual expansion with Sobel gradients	Enrich feature information

This table highlights the key improvements made to adapt YOLOv8 for casting part defect detection, particularly focusing on small targets.

Experimental Analysis and Validation

The experiments were conducted on a Windows 10 system with an I5-11400 CPU, RTX3060 GPU, based on the Pytorch 1.8.1 deep learning framework and Python 3.8. The dataset consists of X-ray DR images of casting parts from a company, including three types of defects: gas pores, inclusions, and shrinkage porosity. These defects manifest differently in DR images: gas pores typically appear as circular shapes, inclusions as irregular polygons with edges, and shrinkage porosity as dendritic or spongy shapes.

Since the original DR images contain various casting part structures with inconsistent sizes, they were cropped into 640 px × 640 px sub-images, and those containing defects were used to create the dataset. The initial dataset had 700 images. To address dataset complexity and improve model generalization, data augmentation was performed through rotation and flipping without compromising target authenticity, expanding the dataset to 3,493 images. The dataset was split into training and testing sets in an 8:2 ratio, with 20% of the training set used for validation. Data augmentation effectively increases target形态多样性 and enhances model robustness, leading to better detection performance.

The distribution of defect sizes relative to the original image is shown in the table below, indicating that most defects are small targets:

Defect Type	Width Ratio Range	Height Ratio Range	Percentage of Small Targets (Ratio < 0.1)
Gas Pores	0.02 – 0.08	0.01 – 0.07	85%
Inclusions	0.03 – 0.09	0.02 – 0.08	80%
Shrinkage Porosity	0.04 – 0.10	0.03 – 0.09	75%

Small targets are defined as those with width or height ratios less than 0.1 relative to the image dimensions, confirming that the primary objects in this study are small target defects in casting parts.

Experimental Results and Analysis

We trained the image data using YOLO series algorithms and our improved YOLOv8 algorithm. The image training size was 640 px × 640 px, with the SGD optimizer, an initial learning rate of 0.01, optimizer weight decay of 0.0005, a batch size of 16, and 300 epochs. To validate the optimization effect of the improved YOLOv8 algorithm on casting part defect detection, we used Precision (P), Recall (R), Average Precision (AP), mean Average Precision (mAP), and mAP@0.5:0.95 as evaluation metrics. In model prediction, four outcomes are generated: True Positive (TP) for correctly detected positive samples, True Negative (TN) for correctly detected negative samples, False Positive (FP) for incorrectly detected positive samples, and False Negative (FN) for incorrectly detected negative samples. Precision and Recall are calculated as:

$$ P = \frac{TP}{TP + FP} $$
$$ R = \frac{TP}{TP + FN} $$

The Average Precision (AP) is the area under the Precision-Recall curve, and mAP is the mean of AP values across all classes at an Intersection over Union (IoU) threshold of 0.5. mAP@0.5:0.95 averages mAP over IoU thresholds from 0.5 to 0.95 in steps of 0.05.

The following table compares the Precision and Recall of different algorithms on the test set:

Model	Defect Type	Precision (%)	Recall (%)
YOLOv5	Gas Pores	74.1	69.5
	Inclusions	70.3	78.3
	Shrinkage Porosity	79.8	76.6
YOLOv7	Gas Pores	63.7	71.5
	Inclusions	61.1	67.4
	Shrinkage Porosity	66.4	56.8
YOLOv8	Gas Pores	77.8	71.1
	Inclusions	69.0	86.1
	Shrinkage Porosity	82.7	82.2
Our Improved YOLOv8	Gas Pores	79.7	76.6
	Inclusions	78.0	80.3
	Shrinkage Porosity	86.1	81.7

This table shows that YOLOv5 performs better overall than YOLOv7 for the three defect types, and YOLOv8 further improves detection rates due to enhanced feature extraction. Our improved YOLOv8 increases Precision for all defect types and Recall for gas pores, but Recall for inclusions and shrinkage porosity slightly decreases, making it difficult to comprehensively evaluate the improvement. Therefore, we use AP values for a more balanced analysis, as shown in the next table.

The following table presents the AP and mAP values for different models:

Model	Gas Pores AP (%)	Inclusions AP (%)	Shrinkage Porosity AP (%)	mAP (%)	mAP@0.5:0.95 (%)
YOLOv5	80.9	79.5	84.2	81.5	47.8
YOLOv7	72.7	65.6	64.7	67.7	33.2
YOLOv8	83.0	81.0	87.0	83.7	47.7
Our Improved YOLOv8	86.1	84.7	87.3	86.1	52.5

In the dataset, some gas pores and inclusions have similar sizes and appearances, making it challenging for deep learning networks to distinguish them, leading to misjudgments. Our improved YOLOv8 algorithm strengthens target feature information and enhances the network’s ability to capture important features, improving discriminability for similar defects. Thus, the AP values for gas pores and inclusions increase by 3.73% and 4.57% respectively compared to the original YOLOv8 model, while shrinkage porosity AP improves slightly by 0.3%. The mAP value rises from 83.7% to 86.1%, an increase of 2.87%. The mAP@0.5:0.95 improves from 47.7% to 52.5%, a 10.10% gain, indicating higher localization accuracy and more precise bounding boxes for casting part defects.

Ablation Study

Our improvements involve two aspects: image data enhancement and algorithm modifications. Image data enhancement refers to channel expansion of grayscale images, converting single-channel images to three-channel color images. Algorithm improvements include replacing the convolution method with dilated convolution, adding the SimAM attention mechanism, and incorporating a small target detection layer. To verify the effectiveness of each改进点, we conducted an ablation study, with results summarized in the table below:

Channel Expansion	Dilated Convolution (Focus)	SimAM	Small Target Detection Layer	mAP (%)	mAP@0.5:0.95 (%)
No	No	No	No	83.7	47.7
Yes	No	No	No	83.9	50.8
No	Yes	No	No	83.5	49.0
No	No	Yes	No	83.3	48.6
No	No	No	Yes	82.9	48.0
Yes	Yes	No	No	85.7	51.4
Yes	No	Yes	No	84.8	50.3
Yes	No	No	Yes	83.9	50.3
No	Yes	Yes	Yes	84.9	51.6
Yes	Yes	Yes	Yes	86.1	52.5

This table demonstrates that using our channel expansion method yields higher mean average precision compared to the network’s automatic expansion into three identical channels. Individually applying dilated convolution, SimAM, or the small target detection layer slightly reduces mAP at IoU=0.5 but improves mAP@0.5:0.95, indicating better overall detection accuracy across different IoU thresholds. When all three algorithm improvements are combined, mAP and mAP@0.5:0.95 increase by 2.39% and 7.76% respectively relative to the original model, validating the effectiveness of the algorithm modifications. Further, with channel expansion added on top of the algorithm improvements, mAP and mAP@0.5:0.95 rise by an additional 0.48% and 2.31%, confirming the benefits of channel expansion for casting part defect detection.

Model Prediction Results

We compared the prediction results of the original YOLOv8 model and our improved YOLOv8 model. The confidence scores indicate the model’s certainty in detecting a target, with higher values representing greater likelihood. The improved YOLOv8 successfully detects defects that were missed by the original model, demonstrating enhanced detection capability and higher accuracy for small target defects in casting parts. For example, in sample images, the improved model identifies additional gas pores and inclusions with confidence scores above 0.8, whereas the original model had lower recall for these instances. This underscores the practical value of our modifications in industrial applications for casting part quality control.

Conclusion

In this study, we introduced an improved YOLOv8 algorithm for small target defect detection in casting part DR images. For image processing, we used the CLAHE algorithm to enhance contrast and highlight defect information, followed by computing gray gradients and fusing them with the original image to expand channel information. For algorithm improvements, we replaced the first convolution layer with dilated convolution to increase the receptive field, incorporated the SimAM attention mechanism to strengthen important features, and added a small target detection layer to enhance small target detection capability. The results show that our improved YOLOv8 algorithm achieves a mean average precision (mAP) of 86.1% and mAP@0.5:0.95 of 52.5% on the channel-expanded dataset, representing improvements of 2.87% and 10.10% respectively over the original model. This demonstrates higher precision in detecting small target defects in casting parts, contributing to more reliable quality assurance in manufacturing processes. Future work could explore further optimizations, such as integrating multi-scale feature fusion or adapting the approach to other types of casting part inspections, to broaden the applicability of deep learning in industrial defect detection.