[DiffYOLO] Innovative Framework Improves Object Detection With Low Quality Data

Computer Vision 18/03/2024

3 main points
✔️ Object detection techniques play an important role in the field of image processing and computer vision.
✔️ We propose a framework called DiffYOLO that improves the accuracy of object detection on low-quality data sets.
✔️ It has been shown that using information learned from the model yields better than usual performance.

DiffYOLO: Object Detection for Anti-Noise via YOLO and Diffusion Models
written by Yichen Liu, Huajian Zhang, Daqing Gao
(Submitted on 3 Jan 2024)
Comments: Published on arxiv.
Subjects: Computer Vision and Pattern Recognition (cs.CV)

code：

The images used in this article are from the paper, the introductory slides, or were created based on them.

Summary

Object detection technology plays an important role in the fields of image processing and computer vision. In particular, models such as the YOLO series have attracted much attention for their high performance and efficiency. However, in real-world situations, not all data is of high quality. Accurately detecting objects on low-quality data sets is even more challenging. New approaches are being explored to address this problem. As an example, a framework called DiffYOLO is proposed in this paper. This could improve the accuracy of object detection on low-quality data sets.

Introduction

In recent years, YOLO has been widely used in object detection tasks in a variety of fields, including automated driving and medical image processing. For example, Alice Freudevaux et al. detected vehicles in satellite images, Sudipto Paul et al. recognized brain tumors on MRI images, and Ethan Gruby et al. detected automatic facial landmarks. However, object detection models, including YOLO, still have difficulty accurately detecting objects in noisy images. Models trained on high-quality data may not perform well on noisy test sets. Therefore, this paper proposes a framework called DiffYOLO, which aims to improve existing models trained on high-quality data and improve their performance on noisy test sets. The framework extracts features from pre-trained diffusion models and incorporates them into existing target detection models to improve their tolerance to noise. Experimental results show that the proposed method improves performance even on noisy images. The method is expected to achieve higher accuracy with fewer resources, without the need to train the model on its own.

DiffYolo Overview Chart

Related Research

Object Detection

Object detection is one of the fundamental tasks of computer vision, and there are many possible methods. For example, there are two-step methods, such as R-CNN and fast RCNN, and one-step methods, such as YOLO. YOLO has been developed from YOLOv1 to better and faster models, such as YOLOx and PP-YOLOE. In this paper, we use YOLOv5 to improve performance in noisy environments.

Diffusion model

Diffusion models are designed to clean data from random noise. Unlike previous models, the diffusion model uses a step-by-step method to achieve its goals, with each step using deep learning to remove noise. This study showed that the diffusion model can make other models more resistant to noise.

Anti-noise

Pre-trained models are easy to obtain, but actual object detection requires clear images. For example, when sending images of industrial sites, problems during transmission, fog, or dark weather can cause noise. Therefore, methods such as NoisyNet and IA-YOLO models can deal with noise.

Proposed Method

The ordinary YOLO model (YOLOv5) found that noise in the image affects object detection. For example, it is difficult to detect objects in rain or fog. Therefore, we propose a new idea to enable accurate detection of objects even in noisy situations.

First, we explain how to remove noise. There is a preparatory process before adding noise and a post-process to remove noise. This is one of the image cleaning methods and involves a process of adding noise and a process of removing noise.

Next, we use the idea proposed by (Dhariwal & Nichol, 2021) to extract features from the image. This is the process of finding the important parts of the image. This produces an image that has special strength against noise.

Finally, the images with this special power are used to train the ordinary YOLO model. This allows the ordinary model to accurately detect objects even in noisy images. Using this method, the model does not need to be trained again. This saves time and can be used in more situations.

Experiment

I tried a method for finding defects on PCBs (printed circuit boards) and compared it to the normal method.

Data-set

The dataset, DeepPCB, contains 1500 actual photographs. It contains common defects found in PCBs (e.g., broken wires, metal contact). In this experiment, high-quality photos were used to train the model and tested with various types of noise added.

Experimental results

In actual operation, certain features were disabled in order to train the model more efficiently. Rather than generating features, the approach used in this paper was to pre-store them and load them into the model as needed.

The following tables describe the results. (a) Detection results for the Yolov5 model (b) Detection results for the DiffYolo model.

Table 1: Detection results for high quality data sets

This table compares the performance of both models on high-quality data sets.

Table 2: Detection results under Gassian noise

Gaussian noise is noise that is randomly added to an image, adding random values from a Gaussian distribution with mean 0 to the pixel values. This can result in slight blurring or subtle color changes throughout the image.

Table 3: Detection results with salt and pepper noise

Salt and pepper noise is caused by the sudden addition of white or black values to random pixels in an image. This causes bright or dark spots to be scattered throughout the image, degrading the overall quality of the image.

Table 4: Detection results under pos. noise

Pose noise is often seen in images taken under low light conditions. This noise is caused by random variations in light intensity that follow a Poisson distribution. Random variations in the brightness of an image degrade its quality.

Each table shows the performance of the models with different types of noise and high quality data sets, and through comparing the results, we can understand how each model performs under different circumstances. We found that when noise is added, the performance of the models decreases, but DiffYOLO outperforms the baseline. In other words, certain methods can make a model more resistant to noise. This method can not only find defects, but also improve the performance of the model itself.

Conclusion

In this paper, a new method was proposed to improve the accuracy of object detection. Experimental results showed that by using information learned from this particular model, better performance than usual can be achieved. This would allow accurate detection of objects in noisy situations using models trained on high-quality images. However, it was noted that this approach has limitations when computational resources for using the model are insufficient or when the data is prone to change.

In the future, it is hoped that a simpler method will be found to solve this problem. It is hoped that this new approach will be more widely used and will further advance the art of object detection.

Categories related to this article

Sasayama