Enhanced Defect Detection Using Tensor CNN

Tensor 25/07/2024

3 main points

✔️ introduces tensor convolutional neural networks (T-CNNs) for defect detection, a critical issue in manufacturing.
✔️ validates its performance in a real application of defect detection in ultrasonic sensor components.
✔️ Quantized T-CNNs significantly improve training speed and performance compared to comparable CNN models by reducing the model parameter space.

Boosting Defect Detection in Manufacturing using Tensor Convolutional Neural Networks
written by Pablo Martin-Ramiro,Unai Sainz de la Maza,Sukhbinder Singh,Roman Orus,Samuel Mugel
[Submitted on 29 Dec 2023 (v1), last revised 26 Apr 2024 (this version, v2)]
Comments: Accepted by arXiv
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Quantum Physics (quant-ph)

code：

The images used in this article are from the paper, the introductory slides, or were created based on them.

Summary

Defect detection is an important and challenging issue in the manufacturing industry. This study introduces a tensor convolutional neural network (T-CNN) and validates its performance in a real-world application of defect detection in ultrasonic sensor components produced in a Robert Bosch manufacturing plant. The authors' quantum-inspired T-CNN significantly improves training speed and performance compared to comparable CNN models by reducing the model parameter space. Specifically, the authors show that T-CNNs can achieve the same performance with up to 15 times fewer parameters and 4% to 19% faster training time. The authors' results show that T-CNNs significantly outperform traditional human visual inspection results and provide value in real-world manufacturing applications.

Introduction

In manufacturing, distinguishing high-quality parts from defective parts is critical during the product assembly process. This task can be especially difficult on mass production manufacturing lines with complex structures. Traditionally, quality control has been performed by experienced human inspectors who visually inspect images of the product. This method requires the analysis of hundreds to thousands of images per hour, fatigue affects the results, and the results are subjective and difficult to quantify. Therefore, it is important to go beyond human inspection and automate the quality control process to improve the accuracy and performance of defect detection and reduce the number of misclassified products.

The big data revolution has led to the development of new algorithms and technologies that take advantage of large amounts of data. Modern deep learning techniques have been widely applied in a variety of fields, including image classification, object recognition, and object detection. Convolutional neural networks (CNNs) have been very successful in image classification due to their ability to extract and learn the most relevant features (colors, shapes, and other patterns) of an image from different classes. They have been particularly successful in critical applications such as defect detection in the manufacturing industry. However, defect detection in real-world environments is a very challenging task. Defects are often minute, diverse, and difficult to identify, requiring model architectures that are sufficiently complex to capture these minute features. This, often associated with models with large parameters, creates bottlenecks in the speed and accuracy of CNNs, requiring more computational resources for training and inference time and potentially degrading the quality of results.

In areas such as energy (smart grids), healthcare (patient monitoring), telecommunications (mobile networks), and manufacturing (quality control and predictive maintenance), it has been shown to be important to place models as close to the data source as possible. This is called edge computing. For these applications, it is important to develop models with excellent performance, high computational efficiency, and low resource usage. This makes it possible to deploy models on small edge computing devices and FPGA devices. Therefore, it is important to reduce the number of CNN parameters without sacrificing performance.

Reducing the number of parameters in a CNN is not as easy as it may seem. Blindly reducing some of the parameters, regardless of the amount of information learned, can result in a significant loss of accuracy and misclassification of images. One of the most common approaches to efficiently reduce network parameters is pruning. Pruning removes weights and filters with small values that contribute little to the information learned by the network; the key to reducing the number of parameters in a CNN is to target the correct portion of the parameter space and remove only those parameters that are least important for learning. Quantum-inspired tensor network methods are prime candidates to perform this task efficiently and systematically. It allows for efficient decomposition of the large tensors used in most modern machine learning techniques. For example, one tensor decomposition method, canonical multiple integral solution (CP decomposition) or singular value decomposition (SVD) and its extension to higher-order tensors, Tucker decomposition, can factorize multidimensional tensors and discard the parts of the original tensor that have little correlation. These factorization methods are used in a variety of fields, including image recognition, component analysis, dictionary learning, and regression modeling.

In this study, we used quantum-inspired tensor network methods and tensor decomposition ideas to improve the efficiency of the CNN and reduce the number of parameters in the training weight tensor by retaining only the critical network parameters to be trained. The value of the resulting tensor convolutional neural network (T-CNN) is demonstrated in a real-world image-based defect detection application in manufacturing.

To demonstrate the potential of T-CNN, this study tested T-CNN in a real-world quality control application detecting defective ultrasonic sensor components on a Robert Bosch production line. Ultrasonic sensors are designed for fast and accurate obstacle detection in short distances, assisting maneuvering and parking in tight spaces, and providing emergency braking capabilities at low speeds by responding quickly to obstacles. They are also used for collision avoidance of automated moving equipment in logistics, construction, and agriculture. Due to the widespread use of these sensors in everyday life, the production of high-quality ultrasonic sensors is of great importance. To this end, the authors built a T-CNN to detect defects in manufactured ultrasonic sensor components and applied it using an image dataset containing thousands of examples collected from several production lines. The authors' results show that T-CNN significantly outperforms traditional visual inspection in quality control, performing on the same quality metrics as traditional CNNs while offering significant advantages in terms of number of parameters and training time.

Related Research

Image Classification and Object Recognition

Deep learning techniques are used in many areas, including image classification, object recognition, and object detection. CNNs excel in their ability to extract and learn the most relevant features (colors, shapes, and other patterns) of an image.

Defect detection in the manufacturing industry

CNN has also been used successfully in defect detection in manufacturing and has been the subject of many studies. Tabernik et al. used a segmentation-based deep learning approach to detect surface defects in manufacturing.

Use of edge computing

Many fields use edge computing, which places models close to the data source. Applications include smart grids, healthcare, mobile networking, and quality control and predictive maintenance in manufacturing.

Parameter reduction for CNN

CNNs are often over-parameterized, creating a bottleneck of computational resources when working with large data sets. Pruning methods are widely used to efficiently reduce CNN parameters.

Applications of Tensor Network Methodology

Quantum-inspired tensor network methods allow efficient decomposition of tensors and are a promising method for reducing the number of parameters in CNNs. Tensor decomposition methods (CP decomposition, SVD, Tucker decomposition) are used in image vision, component analysis, dictionary learning, and regression models.

Application in real-world manufacturing environments

Most studies have been developed and tested on standard data sets and are not fully representative of performance in real-world manufacturing environments. This study uses tensor decomposition techniques to improve the efficiency of CNNs and demonstrates their value in a real-world image-based defect detection application in manufacturing.

Issue Overview

Here is an overview of the problem and the characteristics of the data set. The product part under inspection is a piezoelectric element, to which two wires are welded. This welding process can involve microscopic defects, and it is important to detect these defects before proceeding to the next manufacturing step. Therefore, the problem is naturally formulated as a binary image classification problem to distinguish between high quality and defective parts. The following subsections describe the details of the dataset and the approach of this study.

Dataset Description

With the advance of Big Data and Industry 4.0, manufacturing companies are collecting vast amounts of data that is continuously generated during the production process. The dataset used in this study is a small fraction of this data, manually labeled for the purpose of training a supervised model to identify defective parts. The dataset contains a total of 11,728 labeled images, each with a resolution of 1280x1024 pixels. There are nine types of defects present in the product parts, each labeled with a number from 1 to 9, as follows

Damage to piezoelectric element
Weakness of welding
Welding Strength
Weld misalignment
Misalignment of piezoelectric elements
foreign substance
Wire breakage
Unevaluable images
Shorted wire, poor length, or no wire

This problem presents multiple challenges. First, the data is collected from multiple twin production lines, where variations in lighting conditions and subtle differences in camera positions can cause the data distribution to vary widely. This makes the model training process difficult and may introduce undesirable biases. In addition, the different types of defects and their absolute numbers on each production line make it difficult to train a single model that will maintain good performance across all production lines. Finally, piezoelectric elements may come from different sources, making identification even more difficult because the defects may be of different construction and materials.

These data characteristics can have a distinct impact on performance. To overcome this challenge, the data preprocessing strategies described in the following subsections are implemented and data expansion strategies are applied during training.

Problem Formulation

Because the dataset consists of a collection of images collected from several production lines, the distribution of each defect type and the absolute number of each varies. This results in insufficient data for some defect classes. To solve this problem, all defect types are combined into a single class and formulated as a binary classification problem.

In addition, a data preprocessing stage will be introduced to adjust for differences in lighting conditions on each production line to standardize and make colors more uniform across all images. Specifically, the contrast of all images will be increased to make colors more uniform, and the shape and boundaries of the piezoelectric elements will be emphasized to help distinguish between high-quality and defective parts. In addition, all images are resized to a resolution of 256x256 pixels and pixel values are normalized.

Tensor convolutional neural network

This presentation describes how quantum-inspired tensor network methods can be used to improve the efficiency of CNN architectures. First, the basic concepts of tensor networks will be introduced, followed by a method for building T-CNNs and how the number of parameters can be reduced compared to CNNs.

Convolutional Neural Networks (CNN)

This subsection provides an overview of CNNs, describing the components of their architecture and the role of the convolutional layer in feature extraction. A classic CNN consists of two main components

Feature Extraction Network: Extracts the most relevant features of input images and data through multiple convolutional and pooling layers.
Classification networks: process learned features in a sequence of all coupled layers to predict labels for images and data.

Each convolution layer learns specific features such as color, edges, and cracks. The pooling layer reduces the size of the learned representation and replaces blocks of data with average or maximum values. The learned features are fed into the classification network and used for final label prediction.

However, processing complex image structures may require the use of many convolution layers. This increases the number of parameters in the model, increases training time, and may result in misclassification and loss of accuracy. Therefore, it is important to strike a balance between the expressive power of the network and the number of parameters. Ideally, only parameters that contain key information should be retained and redundant parameters should be removed so that they do not affect the performance and accuracy of the model.

Basic Concepts of Tensor Networks

Tensor networks (TNs) originally emerged in physics to provide an efficient representation of the ground state of quantum systems TNs have high potential for efficient representation and compression of data. In particular, TN has been very successful in machine learning tasks such as classification, clustering, anomaly detection, and differential equation solving.

A tensor is a multi-dimensional array of complex numbers, each dimension corresponding to a rank of the tensor[see Figure 1]. A tensor network diagram is a visual notation for representing tensors. A tensor of rank-n is represented as an object with n connected lines. This reduces the burden of the mathematical representation of tensors.

Figure 1. tensor network diagram. Each rank-n tensor is represented by an object with n connected links, each representing an individual dimension of the tensor. Scalar, vector, matrix, and rank-n tensors have 0, 1, 2, and n connected legs, respectively.

Tensor contraction (equivalent to general matrix multiplication) is also represented in tensor network diagrams. For example, the contraction of a rank 2 tensor (matrix) is represented by connecting shared lines [see Figure 2].

Figure 2. contraction of a tensor. contractions are equivalent to traces on shared subscripts. It is represented by connecting shared links. Here, the R and S tensors are linked along the shared foot β. This contraction operation is equivalent to matrix multiplication. It is equivalent to matrix multiplication.

One important technique for tensor decomposition is singular value decomposition (SVD), which decomposes an input matrix into the product of two unitary matrices and a diagonal matrix of singular values SVD is extended by using Tucker decomposition for higher order tensors [see Figure 3].

Figure 3. decomposition of a matrix into two rank-3 tensors by SVD. It is called the matrix product operator (MPO).

Tensorization of CNN

Next, we describe how to tensorize a CNN using a tensor network and the Tucker decomposition: consider a 2D CNN, where each layer has a weight tensor of rank 4, and each layer has a weight tensor of rank 5. Each convolutional layer of a classic CNN has a rank 4 weight tensor [see Figure 6].The training process for a CNN is to find the optimal parameters of the weight tensor for each layer.

Apply tensor decomposition to factor the weight tensor and remove redundant portions while retaining the most relevant information. Use the Tucker decomposition to approximate the original tensor as a product of the core tensor and four factor matrices. The dimension of the factor matrices then controls the compression ratio. A model that uses this method to tensorize a CNN is called a T-CNN.

Figure 4. HOSVD (Tucker) factorization of convolution Decompose the weight tensor into a core tensor and four factor matrices. χ is the factorization (truncation) rank of the factor matrix. It is the factorization rank of the matrix.

T-CNN training is performed using automatic differentiation and backpropagation via gradient descent; unlike classic CNNs, T-CNNs update four smaller factor matrices and core tensors at each convolution layer, rather than one large rank-4 weight tensor. Training of the tensorized model is done directly in the compressed space of the new tensor representation.

Parameter Counting

In classic CNNs, model parameters fall into three main categories

Parameters of convolution layer (Nc)
Bias parameter (Nb)
Parameters of the classification layer (Nr)

The total number of CNN parameters (NCNN) is calculated as follows

The number of parameters in the tensorized convolution layer is calculated as follows

The total number of parameters in the T-CNN (NT-CNN) is as follows

The parameter compression ratio (Cr) is defined as follows

Experimental Setup

This section describes the experimental setup used to build, train, and test the model in a defect detection application.

model architecture

A simplified version of the standard VGG16 architecture was used as a reference model. The structure of this CNN model was incrementally tuned and optimized for optimal performance. Based on the optimized CNN model, this architecture was tensorized and a new T-CNN model was trained from scratch, replacing the regular convolutional layers with tensor convolutional layers [see Figure 1]. The rank settings of each layer are optimized as hyperparameters.

Various tensor decomposition schemes can be used to construct T-CNNs, including MPS, Tucker decomposition, and CP decomposition, but this study found that the Tucker decomposition provides the most stable training and superior results. In addition, the classification layer of the CNN can also be tensorized using standard tensorization schemes. In particular, the tensor regression layer is an interesting approach because it directly connects the outputs of the convolutional layers without flattening them. However, the introduction of a tensor regression layer was not beneficial to the performance metrics in this study and therefore was not included in the final model.

Training Setup

All models were implemented using PyTorch, and mixed precision training techniques were utilized to reduce memory usage and computational requirements. All models were trained in 80 epochs using the Adam optimizer. The learning rate was varied from 3 x ^{10-4 to} 1 x ^10-6 using a multi-step fixed learning rate scheduler. Models were trained on an NVIDIA T4 GPU with 16 GB of memory and designed to hardware accelerate deep learning training and inference.

The entire data set was divided into three parts, 80% for training, 10% for validation, and 10% for testing. Thus, of the 11,728 labeled images, 9,382 were used for training, 1,173 for validation, and the remaining 1,173 for testing. The first split was used only for model training, the second for model selection and hyperparameter tuning, and the last split was used to test model performance on unseen data and to generate all results.

The current deep neural network model requires a lot of data to obtain good results and to prevent overfitting. For this reason, to increase data diversity during training, we implemented data augmentation, in which existing data is slightly modified to create new training examples. This process eliminates spurious correlations specific to each production line (such as residual color patterns or slight differences in camera orientation), making the model more robust. Specifically, a combination of random color changes, resize crops, and cutout extensions were used on all images and tuned for best performance. This preprocessing and enhancing approach makes the model more robust to the specific features of each production line.

Additionally, because there are more images of defective products than images of normal products in the original dataset, it is possible that the model may learn to correctly identify only the majority class (normal products). To mitigate this class imbalance, we collected additional images of defective products and used a weighted random sampling technique that gives more importance to the minority class.

Performance index

In unbalanced data sets, performance measures such as accuracy can be misleading. Therefore, this study evaluated model performance using accuracy, repeatability, and F1 scores as quality indicators. These indicators are defined as follows:

In addition, we have introduced a "slip-through" metric to measure the percentage of defective images that escape detection in the quality control process. This metric measures the percentage of defective images that escape detection. is defined as:

To measure the efficiency of a neural network, we chose a metric that combines compression ratio and training time improvement. Training time improvement (T) is defined as the training time advantage of the T-CNN divided by the training time of the CNN:

Result

This section provides a detailed analysis of the performance of T-CNN in classifying high quality and defective piezoelectric components.

T-CNN Performance

In this subsection, we analyze the performance of T-CNNs with different tensor convolution layer rank settings compared to classical CNNs in terms of quality metrics, number of parameters, and training time.

Model details and rank setting

Each tensor convolution layer is parameterized by a 4-rank tensor. The number of input channels ( _rin) and the number of output channels ( _rout ) are related to the convolution filter size ( $ℎh$ and $𝑤w$ ) [see Table 1].The T-CNN is built with different models with fixed rank settings, with the maximum rank setting for each tensor convolution layer serves as an upper bound.

Table 1. performance of T-CNN models with multiple rank configurations, measured by quality index, compression ratio, and learning time improvement over the optimized CNN. columns 2 through 5 show the 4 dimensional ranks. For each model, the results are presented as averages over 20 different random seeds and thus 20 different network initializations, with uncertainty corresponding to one standard deviation. Uncertainty corresponds to one standard deviation.

Table 1 shows the detailed analysis of five T-CNN models with different rank settings for the tensor convolution layer. The rank settings for each model are (96, 96, 3, 3), (64, 64, 3, 3), (32, 32, 3, 3), (16, 16, 3, 3), and (8, 8, 3, 3). These settings represent a balance between quality index, compression ratio, and training time.

Quality Indicators and Compression Ratio

To evaluate the performance of the T-CNNs, accuracy, recall, and F1 score were used as quality indicators. The results in Table 1 show that the T-CNN with rank (96, 96, 3, 3) performs the same as the optimized CNN, with 20% fewer parameters and similar training time. Furthermore, T-CNNs of ranks (64, 64, 3, 3) and (32, 32, 3, 3) achieve similar quality metrics, showing a significant reduction in the number of parameters and training time.

The results suggest that the compressed parameter space outperforms CNNs because the tensor convolution layer effectively captures key correlations in the data and preserves essential information while ignoring noise. Lower-ranked T-CNNs exhibit a tradeoff between quality metrics and computational efficiency, achieving very high compression ratios while showing a slight degradation in performance.

error analysis

We perform an error analysis of the optimal T-CNN model in Table 2. The goal is to gain valuable insights for future model improvements.

Table 2. performance of the optimal T-CNN model on the test data for rank configurations (32, 32, 3, 3), as measured by the quality index, compression ratio, and training time improvement. In this case, regardless of the decision threshold, the AUC index is also included as a measure of model performance. In this case, regardless of the decision threshold, also include the AUC metric as a measure of the model's performance. The results of the best CNN are shown for comparison. In all cases, the selected model showed the best performance as measured by the F1 score of the validation data. All metrics are fixed This is key to lower slip-through. For reference, the estimated slip-through for human inspection in a typical production line shift is 10%.

Causes of misclassification

A detailed analysis of the misclassified defect images led to the following conclusions

Some images could be correctly classified by lowering the decision threshold to reduce slip-through.
Some images have been mislabeled and the sample needs to be re-annotated.
Images belonging to the classes of weak welds, strong welds, out-of-position welds, and foreign bodies were found to be difficult to identify.

To improve these misclassifications, it is important to increase the data for these defective classes and add them to the training set. In addition, the entire data set needs to be scrutinized.

Optimal T-CNN model performance

The optimal T-CNN model achieves quality metrics comparable to the classic CNN at rank (32, 32, 3, 3), with 4.6 times fewer parameters and 16% less training time. This indicates that the compressed parameter space effectively captures key correlations and ignores unwanted information. Furthermore, the model significantly outperforms human inspection by reducing the slip-through rate of defect detection in the quality control process from 10% to 4.6%.

These results provide important economic and quality assurance benefits in manufacturing quality control systems and improve product quality and reliability. Integrating T-CNN models into quality control systems also reduces the burden on human inspectors and improves efficiency and productivity.

Conclusion and Outlook

Automating quality control and detecting imperfect and defective products in mass production is one of the most important and challenging tasks in manufacturing. Traditional methods rely on human visual inspection of product images, which is error-prone because defects are subtle and difficult to identify. This increases the quality of the results and the burden on the human inspector. In this study, tensor network methods were integrated into a CNN structure to create a tensor convolutional neural network (T-CNN); the T-CNN model was used to automate and improve defect detection in an actual quality control process at a Robert Bosch manufacturing plant.

The T-CNN model is built by replacing the convolution layers of the classic CNN with tensor convolution layers based on higher-order tensor decomposition. The rank-4 weight tensor of each convolutional layer of the classic CNN is decomposed into four factor matrices and one core tensor. The interconnection dimension of the factorized tensors controls the data compression ratio and the amount of correlation between the factorized parts. The authors trained the T-CNN model from scratch in the compression parameter space of the new tensor representation. The results show that T-CNNs exhibit comparable training performance to classical CNNs, with the two advantages of significantly fewer parameters and shorter training time. These advantages provide important benefits in terms of computational cost, training time, robustness, and interpretability.

The authors' results show that the higher-ranked T-CNNs provide modest improvements in both parameter count and training time while achieving the same performance as classic CNNs in quality metrics such as accuracy, recall, and F1 score. T-CNNs with lower ranks reduce training time by significantly reducing the number of parameters while maintaining the quality metrics. In particular, T-CNNs with a maximum rank of (32, 32, 3, 3) for each layer have 4.6 times fewer parameters and 16% shorter training time. These results suggest that the compressed parameter space defined by the tensor convolution layers is superior to classical CNNs in capturing the main correlations in the data and ignoring unwanted noise.

Furthermore, the authors' model performed significantly better than human inspection, reducing the percentage of defect images missing detection from 10% to 4.6%. In a typical production line, this 54% improvement translates into cost savings and improved product quality and reliability. In addition, integrating the T-CNN model into a quality control system frees human inspectors from monotonous and time-consuming visual inspection tasks and puts human resources into areas where creative and problem-solving skills are required. This is expected to increase efficiency and productivity.

The advantages of T-CNN have been suggested to be applicable to other data sets and have great potential for integration into quality control processes for real-time defect detection in manufacturing T-CNN models are fast, accurate, and efficient and can be deployed in mobile devices, edge computing devices, and can be deployed in small devices such as FPGAs. In addition, the low parameter count of T-CNNs improves energy efficiency, making them suitable for devices with small energy resources and batteries.

Finally, the factorization rank of the T-CNN acts as a hyperparameter, controlling the number of parameters in the tensor convolution layer. Although this study used multiple rank settings for extensive analysis, systematic hyperparameter optimization is recommended for optimal performance.

Categories related to this article

友安昌幸 (Masayuki Tomoyasu): JDLA G certificate 2020#2, E certificate2021#1 Japan Society of Data Scientists, DS Certificate Japan Society for Innovation Fusion, DX Certification Expert Amiko Consulting LLC, CEO