Adversarial Domain Adaptation To Address The Problem Of Lack Of Labeled Training Data

Domain Adaption 08/09/2023

3 main points
✔️ A novel adversarial domain adaptation approach that supports heterogeneous adaptation when the source domain has different characteristics than the target domain
✔️ Combining the domain adaptation approach with an autoencoder-based data augmentation approach, the target data set Addresses the problem of imbalance in
✔️ Demonstrates superior performance over other algorithms when the number of labeled samples in the target dataset is significantly low and the target dataset is imbalanced

Building Manufacturing Deep Learning Models with Minimal and Imbalanced Training Data Using Domain Adaptation and Data Augmentation
written by Adrian Shuai Li, Elisa Bertino, Rih-Teng Wu, Ting-Yan Wu
[Submitted on 31 May 2023]
Comments: Published on arxiv.
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

code：

The images used in this article are from the paper, the introductory slides, or were created based on them.

Summary

The paper uses image data as a motif for predicting wafer defects in the semiconductor manufacturing process. I think it will be of interest to industry professionals, and I think other readers may also find this paper on the manufacturing process of semiconductor devices that support IT, including AI, interesting.

Deep learning (DL) techniques are very effective in detecting defects in images. However, training DL classification models requires a huge amount of labeled data, which is often expensive to collect. In many cases, available training data is not only limited but also potentially imbalanced. In this paper, we propose a novel domain adaptation (DA) approach to address the problem of lack of labeled training data for target learning tasks by transferring knowledge obtained from existing source data sets used for similar learning tasks. The approach is effective for scenarios where the source dataset and the dataset available for the target learning task have the same or different feature spaces. Combining the authors' DA approach with an autoencoder-based data augmentation approach addresses the problem of unbalanced target datasets. The authors evaluate their combined approach using image data for wafer defect prediction. Experimental results show superior performance over other algorithms when the number of labeled samples in the target dataset is significantly low and the target dataset is unbalanced.

Introduction

Defect detection is an important manufacturing process, but it is often expensive in terms of manpower and time. For example, in semiconductor wafer fabrication, microscopic images of the wafer surface must be scanned and manually inspected by operators for the presence of defects. Another example is the analysis of crystal size distribution in solutions used in the food industry, which is done manually by an operator using a microscope. Therefore, the use of machine learning (ML) technology for its ability to efficiently and effectively analyze different types of data, such as images, sound, and vibration, in many applications, such as diagnosing machine failures, predicting the life of manufacturing equipment, recognizing product defects, and improving the robustness of sensors against failure, is It is not surprising.

However, a requirement for using ML-based solutions is the availability of an adequate amount of training data sets, which is especially important when using complex ML models such as deep learning (DL) models, as discussed by Shao et al. The reason is that these models have a large number of layers and require large training datasets. A promising approach to address these issues is the use of Transfer Learning (TL) techniques. Transfer learning allows knowledge, in the form of pre-trained models or training data, to be transferred from one domain, called the source domain, to another related but different domain, called the target domain, where training data is scarce. Examples of relevant domains include brain MRI images of different age groups, summer and winter photos, and photos taken with different color filters. It is also important to note that training data may be of poor quality, especially if the collection process is imprecise or difficult, e.g., lack of labels or imbalance in class distribution.

To address the problem of data scarcity, traditional TL-based approaches typically leverage pre-trained models and use a limited training sample of the target domain to fine-tune trainable parameters. However, because these pre-trained models typically learn inference from huge datasets such as ImageNet, the models contain many redundant features and irrelevant latent spaces that have no benefit for the target inference task. In addition, manual work is required to determine which layers are trainable, for example. Adversarial domain adaptation (DA), on the other hand, aims to learn a target task by leveraging training samples from a source domain with the same label set. To adapt to domain shifts, DA uses neural networks to create domain-independent representations of data from different domains. If the domain-independent representation can effectively classify objects in the source domain, it may be able to recognize the same objects in the target domain The DA approach has been shown to be effective in many image benchmarks. However, the assumption of a balanced target domain dataset (targets also have limited labels) is a common limitation of many DA approaches. Real-world datasets often have unbalanced class distributions, which can negatively impact the performance of DA models.

To address class imbalances, common approaches include image warping, weighted loss functions, and oversampling and undersampling of training data for minority and majority classes, respectively. However, the effectiveness of these methods is highly dependent on the nature of the dataset and the learning task at hand. There are also approaches that utilize generative models such as Generative Adversarial Networks (GAN), Autoencoders (AE), and Diffusion Models (Diffusion Model) to generate synthetic images for data augmentation. There are also Unlike discriminative models, generative models can generate realistic data samples and are expected to have a significant impact in many application areas in the coming years. Synthetic data is typically easier and less costly to obtain than real-world data. Nevertheless, one of the major problems with synthetic data generated from these models is that systems built using synthetic data sets often fail when deployed in the real world. This is due to the misalignment of distributions between synthetic and real data and is known as the sim-to-real problem.

This paper presents a pipeline that addresses these shortcomings. This pipeline combines (A) an autoencoder-based method that can augment target data by generating a small class of synthetic data using Gaussian noise and the latent space learned by the encoder, and (B) a new adversarial DA-based TL domain architecture that addresses the lack of training data and the synthetic data shift problem (B) a new TL domain architecture based on adversarial DA that addresses the lack of training data and the problem of shifting synthetic data. Autoencoder-based methods ensure that target data augmented for DA have a balanced class distribution. To improve generalization to real target data, the proposed DA approach is applied using source and extended target data. The main contributions of this paper are as follows:

1) A DL pipeline that addresses the problem of small and imbalanced datasets
2 ) A novel adversarial DA-based approach for adapting heterogeneous source and target datasets (e.g., source and target data have different feature spaces)
3) A commonly used Extensive evaluation of the proposed pipeline and comparison with other methods using commonly used wafer fabrication datasets. We show that the combined use of both methods yields superior performance compared to using each method alone.

Hostile domain adaptation through data augmentation

The pipeline in this paper consists of two steps. The first step uses an autoencoder-based approach to augment the target dataset with unbalanced classes. The source dataset is assumed to be balanced. The second step generates a classification model to predict the classes in the target dataset. The classification model uses as input a domain-independent latent space learned from the source and augmented target datasets using the authors' adversarial DA approach.

A. Data Augmentation with Auto Encoder

An autoencoder is a neural net trained to reconstruct the input. It consists of two components: an encoder enc that produces a compressed latent space h = enc(x) for input x, and a decoder dec that produces a reconstruction $ \hat{x} $ = dec(h). The objective is to minimize the reconstruction error of

The autoencoders can be trained using mini-batch gradient descent. In each batch, you feed some data to the autoencoder, which backpropagates the errors and adjusts the weights of the network through the layers. Autoencoders can extract useful information from the data, but they can cheat by copying inputs to outputs without learning useful properties of the data. One way to prevent the copy task is to use an imperfect autoencoder whose latent space dimension is smaller than the input. The smaller dimension forces the autoencoder to learn the most important attributes of the data.

To generate synthetic data using an imperfect autoencoder, the autoencoder is first trained with the target data using the loss function in (1). The algorithm then takes the source data as input to the trained autoencoder and maps the source data to a compressed representation. Instead of passing the generated representation to the decoder, it adds random noise drawn from a standard Gaussian distribution to the representation and passes it to the decoder to generate new synthetic data. The new data is labeled as the same class as the original data. To obtain a balanced training data set, the algorithm is repeated many times for classes with small samples. Finally, the original and synthetic data generated in the above steps are combined to obtain the expanded target data. The expanded data is used in the DA algorithm described below.

B. Hostile Domain Adaptation

・1) Network, input and output

The architecture of this paper consists of five neural networks (see Figure 1).

1) GS is a private generator of sources

2) GT is the target's private generator

3) G is for Shared Generator

4) D is the discriminator

5) C is for Classifier

Note that for simplicity, the name of the neural network includes the network architecture and all its weights.

Figure 1: Illustration of the proposed DA algorithm

Source data is given by ( ^xs, ^ys, ^ds ), where ^xs is the source data sample, ^ys is the label, and ^ds is the domain ID of the source (for example, $ d^s_i $= 0 for any source sample $ x^s_i $). Similarly, the target data is given by ( ^xt, ^yt, ^dt ), where ^xt is the target data sample, ^yt is the label, and ^dt is the domain ID of the target (for example, $ d^t_i $= 1 for any target sample $ x^t_i $). In addition, ^Ns is the number of samples in the source domain and ^Nt is the total number of samples in the target domain, where ^Ns ≫ ^Nt.

^xs and ^xt are inputs to private generators _GS and _GT, respectively; since _GS and _GT are separate networks, inputs xs ^and^xt can have different dimensions. The shared generator G learns a domain independent representation (DI ) from the outputs of GS _and_GT. Therefore, the private generators must have output vectors of the same shape. The output of DI by the corresponding network is

DI is then used as input to networks D and C. The outputs of the two networks are $ \hat{d} $ from discriminator D and $ \hat{y} $ from classifier C.

・2) Loss function and training

Classification loss is defined by the following equation and measures the error in label prediction in both domains (considering sufficient labeled data in the source and limited labeled data in the target).

Where $ y^s_i $ and $ y^t_i $ are one-hot encodings of the labels of the source input $ x^s_i $ and target input $ x^t_i $, respectively. $ \hat{y}^s_i $ and $ \hat{y}^t_i $ are the softmax outputs of C, with λ as the penalty coefficient for loss values obtained from the target data points. A good classifier should predict the correct label for the source and target data points.

Loss of discriminator trains the discriminator to distinguish whether DI is generated from source or target data. _di is the domain identification of data _xi ( _di ∈ {0, 1}) and $ \hat{d}_i $ is the output of discriminator D. Since the goal of the discriminator is to reduce the domain classification error, it minimizes _Ld

The loss of the generator is the loss of (5) with the domain truth labels inverted.By minimizing _Lg, the generator is trained in a hostile manner, maximizing the loss of the discriminator.

The key to successful DA is to learn predictable, domain-independent features across domains. A rich domain-independent representation must contain enough information for effective classification, no matter which domain the input data comes from. To achieve domain invariance, the discriminator and multiple generators are trained adversarial; to give the DI predictive information, the generators are also trained to minimize classification loss. The following paragraphs describe the training algorithm in more detail.

The training of _GS, _GT, and G consists of optimizing Lg _and_Lc, because we want to minimize domain classification accuracy and maximize label classification accuracy. The discriminator is trained with _Ld to maximize domain classification accuracy. The classifier is trained with _Lc. The authors' training algorithm follows a mini-batch gradient descent method. Such a procedure selects an equal number of source and target samples, computes the output and loss functions, and adjusts the weights in the opposite direction to the gradient vector. This same process is repeated until the loss function no longer decreases. Specifically, after creating a fixed-size mini-batch, the following steps are performed The generator updates its weights to minimize the generator loss and classification loss, as in Equations 7-9. The classifier updates its weights to minimize classification loss, as in Equation 11. During this step, the discriminator's weights remain frozen. The discriminator then updates its weights to minimize discriminator loss according to Equation 10.

where μ is the learning rate. The hyperparameters β and γ are the relative weights of the loss function.

Experiment

This pipeline is applied to wafer defect prediction. Wafer inspection is a critical step in semiconductor manufacturing, evaluating the die in the wafer and filtering out defective dies. Previous work has used machine learning (ML) approaches to speed up the prediction process. However, as the authors' experiments show, real-world wafer data suffers from low quality, including lack of labels and unbalanced class distributions, making most ML methods unsuitable. Experiments were also compared with existing algorithms, including fine-tuning-based and DL-based methods.

A. Wafer Data Set

・1) Source dataset

The MixedWM381 dataset is used as the source dataset; MixedWM38 contains one normal pattern, eight single defect patterns, and 29 mixed defect patterns, with approximately 1000 samples in each category. These wafer maps were obtained at the wafer fab. The size of each wafer map is 52 × 52. MixedWM38 has no missing labels and the data size is constant. The training data set is also balanced.

・2) Target dataset

Target WM-811K Data Set 2 The WM-811K data set consists of 811457 wafer maps collected from 46293 lots. This data set contains eight single defect patterns and one normal class, which are also included in MixedWM38. However, the WM-811K dataset has three common problems found in manufacturing datasets. The first problem is the large number of unlabeled samples in this dataset; of the nine wafer maps, only about 20% are labeled wafer maps that can be used for training. Second, the labeled wafer maps vary in size. Finally, the dataset is highly unbalanced.

To solve the first two problems, we remove the unlabeled wafer map and select a wafer map of size 26 x 26 from the remaining data. We chose this size because it is the only size group for which there is data in each class. Grouped by distribution of defects, there are 90 center, 1 donut, 296 edge-loc, 31 edge-ring, 297 local, 16 near-full, 74 random, 72 scratch, and 13489 normal. For each class except donut, 60% of the wafer maps were randomly selected from the training set and the rest were included in the test set. These two sets are discontinuous except that they share the same data for the donut class, and since there is only one sample available, we would like to include such patterns in the classification.

To address the third problem, i.e., unbalanced training data, we use the autoencoder-based data augmentation method introduced. The encoder has 64 3 × 3 CONV layers, one RELU activation layer, and one MAXPOOLING layer. The decoder has 64 3 × 3 CONVT layers, an UPSAMPLING layer, 3 3 × 3 CONV layers, and a SIGMOID output layer. For each defect class in the training set, 2000 synthetic wafer maps were generated. The normal class was skipped because the training set already contained more data. Note that data augmentation uses only the WM-811K training data, without looking at the WM-811K test data.

B. Description of the experiment

Compare pipelines in different settings and with different approaches. The methods considered are as follows

・1) Hostile DA + extended target data

The authors use the MixedWM38 training data as source training data and the extended WM-811K training data as target training data. These data are used as input for the adversarial DA network, which is then trained based on the process described. This is the proposed approach.

In the architecture used for the experiments, _GS/GT has two convolutional layers: eight 5 × 5 filters (CONV1), 16 5 × 5 filters (CONV2), two max pooling layers of size 2 × 2 each after CONV1 and CONV2, and 2028 neurons It has one fully connected layer; G has the same configuration as GS _and_GT, but the last fully connected layer has only 1024 neurons and adds a reshaping layer of (26, 26, 3) at the beginning of the network; D has a similar configuration to G, but with a softmax output layer for domain prediction; and G has a softmax output layer for domain prediction. The classifier has two fully connected layers with 1024 and 512 neurons, respectively, and a softmax output layer for class prediction.

・2) Hostile DA + disproportionate target data

We will still use the adversarial DA network, but we will replace the target training data with unaugmented, unbalanced WM-811K training data. Compare this approach to Approach 1) to determine if the data augmentation step improves the performance of the adversarial DA.

・3) Fine tuning + target data augmentation

To identify machine faults from images of induction motors, gearboxes, and bearings, a fine-tuning approach by Shao et al. is used to transfer knowledge learned from common images. They use a VGG 16 model pre-trained on ImageNet; VGG 16 has five convolutional blocks and a fully connected block. They freeze the first three convolutional blocks and retrain the last two convolutional and fully connected blocks using the machine failure diagnostic dataset. Cross-entropy loss helps to evaluate the error between the true labels and the predicted probabilities. The authors implement their approach, but replace the machine failure dataset with the extended WM-811K training dataset. The output layer of the pre-trained VGG 16 model is replaced by a new layer with nine neurons corresponding to nine classes.

・4) Fine tuning + unbalanced target data set

This approach is identical to the previous one, but uses the unbalanced WM-811K training data set. Compare this approach to the previous one to determine if the data augmentation step is valid for the fine-tuning approach.

・5) Vanilla classifier + extended target dataset

Trains a deep neural network that acts as a classifier to detect wafer map defects. This network is trained using cross-entropy loss, augmented with WM-811K training data. The classifier uses an architecture compatible with the prediction pipeline used in our DA methodology, so comparison numbers are fair and meaningful. The classifier has three convolutional blocks and two fully connected blocks. Each convolutional block has a CONV layer and a RELU layer. Each convolutional layer has {16, 64, 128} increasing output filters. Each fully connected block has an FC layer and a RELU activation layer. The first FC layer has 512 neurons and the second FC layer has 128 neurons. The output layer has 9 neurons, followed by a SOFTMAX layer to predict the probability of each class.

・6) Vanilla classifier + imbalance target dataset

Train the same deep neural network as in 5) using unbalanced WM- 811K training data.

Results and Analysis

TensorFlow and Keras libraries are used to train the adversarial DA and other classification models. For the adversarial DA, we train 20000 iterations with a batch size of 32. We use the Adam optimizer with a starting learning rate of ^2e-4 and set hyperparameters λ = 0.1, β = 1, and γ = 1 (hyperparameters are not tuned using validation samples). In the fine-tuning approach, the VGG 16 pre-trained model implemented by Keras requires the input to have exactly three input channels and a width and height no smaller than 32. The input size of the target training data is 26 x 26 x 3, which is an invalid value. For the fine tuning approach and the vanilla classifier method, we train 60 epochs with the Adam optimizer with a batch size of 32 and a learning rate of 2e - 4. By comparing the performance in each epoch, an early stopping method that preserves the optimal weights is applied to the training.

For the purposes of this evaluation, the source training dataset consisted of 5,294 evenly distributed wafer maps from nine categories. All of these experiments were performed on the target training dataset, which contains only 25, 50, 75, 100, 200, 500, and 1000 randomly selected samples. The purpose of these experiments is to show the effect of target training data size on the performance of different models. Balanced classification accuracy and correctness were calculated on the target test data and 95% confidence intervals are presented in Table I and Figure 2. These confidence intervals were obtained from five repeated experiments. Table II shows the training and test times for the different approaches. The performance measure is suitable for evaluating models on unbalanced data sets. Balanced accuracy is designed to perform well on unbalanced data. It is defined as the average of the recalls obtained for each class and is calculated as the sum of true positives divided by the sum of true positives and false negatives. Precision, on the other hand, is calculated as the sum of true positives across all classes divided by the sum of true and false positives across all classes. The more false positives, the lower the accuracy.

Figure 2. classification accuracy scores achieved with reinforced and unbalanced target data, comparing six approaches: vanilla deep CNN trained on reinforced/unbalanced target samples, pre-trained VGG 16 model fine-tuned on reinforced/unbalanced target data, reinforced/unbalanced Adversarial DA architecture of this paper trained on target data.

Table I

Balanced Classification Accuracy vs. Average Reproducibility on WM-811K Test Data

Table II.

Comparison of training and testing times for the three methods. Results are obtained on 1000 target data sampled from extended target training data. The proposed da model can be trained offline. Prediction time is comparable to vanilla classifiers.

With 25-1000 samples, we find that our target-enhanced adversarial DA method outperforms fine-tuning and deep CNN methods in terms of the balance between accuracy and precision; with more sophisticated models such as ResNet, our method and deep neural network performance would be even better. Nevertheless, the inferior performance of the vanilla classifier approach when trained on a comparable architecture and with very little data shows the limitation of the DL approach: it requires a lot of training data to learn the input output function of the model. If this training is not done well, the well-known overfitting problem occurs, where the model remembers the training data and cannot generalize well to new test data. as a TL approach, fine tuning uses a pre-trained model, so, network can obtain sensible weights that can be transferred to the target task. However, it does not directly address the problem of insufficient training data. We speculate that the poor performance of the fine-tuning approach in our experiments may be due to the large difference between ImageNet and WaferMap, which requires a reasonable amount of target data to successfully update the weights in the pre-trained model. Our adversarial DA approach, on the other hand, achieves the best results because it mitigates the problem of less target training data by using domain-invariant features that emerge during the optimization process. Given enough balanced source data, the adversarial learning framework allows us to learn the features even with very limited target data.

For all three methods, training on augmented target data significantly improves performance, demonstrating the effectiveness of data augmentation techniques when dealing with highly unbalanced data. For example, using augmented targets for the adversarial DA approach improves balancing accuracy by 5% to 16% and precision by 6% to 15% for 25 to 1000 samples. This observation is even more pronounced for fine tuning and vanilla classifier methods. With unbalanced targets, the fine-tuning method cannot learn usefully. On the other hand, the evidence that the authors' DA approach outperforms other methods even when 1000 augmented target samples are used for training is confirmation that this DA approach generalizes better on target test data (real data).

Use DA for non-classification tasks

The authors' approach to addressing the problem of insufficient training data can be extended to tasks beyond classification. Here we briefly describe recent approaches in the areas of optimization, reinforcement learning, and robot learning to handle domain shifts and achieve effective knowledge transfer.

In the field of Transfer Optimization (TO), solutions from various source optimization problems are used as solutions to destination optimization problems.The approach of Jiang et al. integrates DA methods into classical evolutionary optimization algorithms to improve the search efficiency of dynamic optimization problems. The approach of Jiang et al. Another approach proposed is to model the function to be optimized, i.e., the objective function, via an artificial neural network (ANN). Such approximations are effective in reducing costs, e.g., by reducing computational costs. However, these approaches require training the ANN using input-output pairs generated from known functions. When the underlying target function is unknown and there are limited measurements available, our DA can use input-output pairs from a known function as the target domain, i.e., the source domain to guide the training of a very limited sample of measurements dominated by the unknown function, meaning that Use.

One of the unsolved problems in reinforcement learning (RL) is that learned policies may not perform well on new input data because the distribution of input data may change over time. Recently, an approach has been proposed to apply DA so that the RL agent works effectively even if the input distribution changes over time. In this scenario, the source domain is a particular input distribution with a particular reward structure. In the target domain, the input distribution is changed, but the reward structure is the same. Domain shift is also a major challenge in learning-based robot perception and control.

Robots trained on simulated data often fail in real-world environments due to the gap between simulated and real; the approach by Tzeng et al. uses a combination of region confusion loss (similar to _Lg ) and pairwise loss to adapt pose estimation from synthetic images to real images with successfully.

Limit

While available data from a single source domain is used to improve generalization of related target tasks, data from many related domains may be useful. For example, there are multiple labeled manufacturing datasets collected over time or from different parties to use as source domains. Our current approach does not directly support multi-source domain adaptation. To use the authors' approach in a multi-source setting, one would need to combine all source data as a single source domain or train on each source domain separately and select the one with the best performance. A better approach is to treat each source domain as a separate domain and learn the information shared among the different domains. Studies along this direction have shown better generalization performance at the target than the single-source approach.

Another limitation of the authors' approach is that it requires at least some labeled data from each class in the target domain. The reason for this is that autoencoder-based data augmentation procedures require that the original targets have labeled data from each category in order to construct a balanced target data set. The authors' adversarial DA approach can be used alone in an unsupervised setting with no labels in the target data, provided that the target data is balanced.

Related Research

A. Adversarial Learning-Based Approach

These methods typically learn domain-independent representations by using two competing networks: a feature extractor/generator and a domain discriminator. One of the first adversarial DA models, the domain adversarial neural network (DANN), has three components: a feature extractor, a label predictor, and a domain classifier. The feature extractor is trained in a hostile manner so as to maximize the loss of the domain classifier by inverting its gradient. The feature extractor is trained simultaneously with the label predictor to create a representation containing domain-invariant features for classification; ADDA (Aversarial discriminative domain adaptation) has a similar component, but its learning process includes multiple stages to train the components. Singla et al. propose a hybrid version of DANN and ADDA, where the generators are trained with standard GAN loss functions.

All of these methods aim to learn domain-independent representations between the source and target domains. However, these methods assume that the source and target data have the same feature space (e.g., they both have the same dimension). Instead, our model supports heterogeneous domain adaptation, where data from two domains can have different dimensions/different number of features. Also, all of these methods consider settings where the target still has enough unlabeled data and the target data is still balanced even though it has no labels. However, in some applications, these models may suffer from unbalanced class distributions relative to real-world data. In this study, we consider a more realistic setting of low-quality target data, where the target has only a small number of labeled data and is very unbalanced.

B. Composite Data Extension

Some approaches use GAN-based architectures to generate synthetic data, such as DCGAN, CycleGAN, and Conditional GAN. Another common strategy for generative modeling is to use an autoencoder - a neural network trained to reconstruct inputs. This network has two components: an encoder that produces a compressed latent space and a decoder that produces a reconstruction. By adding noise to the compressed representation, the autoencoder produces variations of the original data. In recent years, diffusion models have received a great deal of attention due to their remarkable generative capabilities. Learning a diffusion model consists of two stages: a forward diffusion stage in which the input data is iteratively perturbed with noise, and an inverse diffusion stage in which the previous stage is reversed to attempt to recover the input data. However, diffusion models are computationally expensive due to the iterative steps involved during training, making them unsuitable for time-sensitive tasks. Selecting the right generative model for the task at hand requires consideration of the advantages, limitations, and costs of each model.

This is because GANs are known to be unstable in learning and prone to mode collapse during training. In addition, GANs require large amounts of training data. Autoencoder-based data augmentation methods, on the other hand, require less data for training, which is consistent with the problem setting where the target has limited data. It is also faster than more complex diffusion models.

Conclusion

In this paper, we proposed a novel adversarial DA approach that supports heterogeneous adaptation when the source domain has different features than the target domain The DA approach is realized by training two private generators and one shared generator The DA approach is intended to address the problem of insufficient target training data, but it does not work well when the target data is unbalanced. Many manufacturers face the reality of low quality data, making it difficult to collect balanced data. To address this problem, the authors further propose a pipeline using autoencoder-based techniques to augment a small number of classes in the training data, followed by the authors' DA approach. Experimental evaluation of this pipeline on a wafer defect data set demonstrates its superior performance compared to other baseline approaches.

Categories related to this article

友安昌幸 (Masayuki Tomoyasu): JDLA G certificate 2020#2, E certificate2021#1 Japan Society of Data Scientists, DS Certificate Japan Society for Innovation Fusion, DX Certification Expert Amiko Consulting LLC, CEO