Catch up on the latest AI articles

Universal Deoxidation Of Semiconductor Substrates Using Machine Learning And Real-time Feedback Control

Universal Deoxidation Of Semiconductor Substrates Using Machine Learning And Real-time Feedback Control

Machine Learning

3 main points
✔️ Machine learning applied to in situ endpoint detection of epitaxial growth in semiconductor manufacturing
✔️ Reflection high-energy electron diffraction (RHEED) image analysis, which gives image data reflecting the crystallinity of the substrate surface, requires expert knowledge and contains errors
✔️ Application of hybrid convolution and image transformer (CNN-ViT) models for real-time and highly accurate endpoint detection

Universal Deoxidation of Semiconductor Substrates Assisted by Machine-Learning and Real-Time-Feedback-Control
written by Chao ShenWenkang ZhanJian TangZhaofeng WuBo XuChao ZhaoZhanguo Wang
(Submitted on 4 Dec 2023)
Comments: Accepted on arXiv
Subjects:    Mesoscale and Nanoscale Physics (cond-mat.mes-hall); Machine Learning (cs.LG); Image and Video Processing (eess.IV); Systems and Control (eess.SY)


The images used in this article are from the paper, the introductory slides, or were created based on them.


As a specific example, we would like to look at the application of machine learning to one of the semiconductor manufacturing processes. Thin film deposition is an integral part of the semiconductor process. In epitaxial growth, where crystal growth follows the direction of the underlying crystal, research has been conducted on process control to remove surface oxides (native oxide) prior to thin film deposition. Optimizing the deoxidation process prior to molecular beam epitaxy (MBE) on random substrates is a multidimensional challenge that is sometimes controversial. Due to the variability of semiconductor materials and growth processes, the determination of the substrate deoxidation temperature is highly dependent on the expertise of the deposition process engineer. Here, we employ a machine learning (ML) hybrid convolution and image transformer (CNN-ViT) model. This model utilizes as input a Reflection High Energy Electron Diffraction (RHEED) video that gives image data reflecting the crystallinity of the substrate surface and determines as output the deoxidation state of the substrate, allowing automatic deoxidation of the substrate under a controlled architecture. This also leads to extension to other on-substrate deoxidation process applications. Furthermore, models trained on data from a single MBE instrument show the potential to achieve highly accurate deployments on other MBE instruments. In contrast to traditional methods, the authors' approach is very practical. It standardizes deoxidation temperatures across a variety of equipment and substrate materials and advances the standardization research process in semiconductor pretreatment, an important milestone in thin film growth technology. The concepts and methods demonstrated in this study are expected to revolutionize semiconductor manufacturing in the optoelectronics and microelectronics industries by applying them to a wide variety of material growth processes.


Epitaxial thin films are at the heart of state-of-the-art optoelectronic and microelectronic devices. The crystal quality and defect density of these layers are greatly affected by the growth conditions and the starting surface after substrate preparation. For the growth of high-quality epitaxial layers by molecular beam epitaxy or metalorganic vapor phase epitaxy (MOVPE), a deoxidation process that removes natural oxides from the semiconductor substrate that prevent good crystal growth is critical.

Typically, an etchant is used to intentionally remove the oxide film prior to epitaxy, but fresh natural oxide film is formed instantly when exposed to the ambient atmosphere. However, the time and temperature of deoxidation are complex and controversial, as they depend on the thickness and structure of the oxide film on the substrate. Reflection high-energy electron diffraction (RHEED) is typically used to monitor surface reconstruction to determine oxide desorption from the substrate; artificial intelligence (AI) was used to analyze RHEED patterns during MBE growth. It detects RHEED patterns in real time during the deoxidation process of Si(111) substrates and classifies them by their similarity to a particular surface reconstruction. A machine learning (ML) model was developed and trained using the RHEED video as input to provide real-time feedback on surface morphology for process control.

However, previous reports have generally concentrated on post-processing analysis of model results for specific materials and data collected, neglecting to build models that are universally applicable to similar applications that allow for real-time deployment. In addition, the collection and application of the data set was limited to the same MBE instrument, and no effort was made to explore models that could maintain consistent performance across different instruments. A universal model would reduce reliance on the experience of process engineers, improve reproducibility of material preparation, and better adapt to constantly evolving technology and material advances.

In this paper, we collected a large amount of deoxidized RHEED video data covering GaAs, along with a small dataset including Ge and InAs. By repeatedly optimizing the training parameters of the model, we have successfully developed a very robust hybrid convolutional image transformer (CNN-ViT) model. We have demonstrated that the model can adapt to samples of various resolutions as input, without being affected by camera hardware resolution constraints. A detailed analysis of the model's parameters shows that the classification results output by the model have clear boundaries, which indicates a high sensitivity to the data.

Furthermore, the regions with the highest weights output from the Attention Module of the CNN-ViT model are consistent with the regions of interest to experienced process engineers. This consistency indicates that the model is reasonable and has strong interpretability: in-situ automated deoxidation experiments were performed, and during the dynamic substrate heating process, the model accurately identified the deoxidation state of GaAs, Ge, and InAs substrates and provided accurate deoxidation temperatures. The

In addition, the model was applied to the analysis of RHEED data collected from substrates and devices not included in the data set, demonstrating its outstanding accuracy and highlighting its robust universal performance. This study demonstrates that a single data source can produce a universal model across a variety of devices in different material systems. Furthermore, the universality of the model allows for standardization of each stage of material growth, reducing errors due to traditional human experience-based judgments.

It is hoped that the development of a more universal standardization model will further advance the standardization process in semiconductor manufacturing.


Samples were prepared using a Riber 32P MBE system with an arsenic (As) valve cracker (which effectively cracks materials with high vapor pressure) and an effusion cell to control vapor phase compounds. Substrate temperature was measured with a C-type thermocouple. RHEED in the MBE growth chamber facilitated analysis and monitoring of the substrate surface during the deoxidation process. RHEED patterns were recorded with an electron energy of 12 kV ( RHEED 12 from STAIB). A darkroom equipped with a camera was set up to continuously capture RHEED video while the substrate was rotated at 20 rpm. The exposure time was 100 ms and the frame sampling rate was 8 frames per second (fps). As shown in Figure 1, the data captured by the camera is processed to preserve only selected square matrix areas during the dynamic heating and deoxidizing process. The collected data is then divided into various images in chronological order. Each image is normalized in the luminance channel and transformed into a 2D matrix with a depth of 8 bits. These multiple consecutive 2D matrices are connected and combined to form a new 3D matrix, which is the input to the model. The output of the model determines whether the substrate has been deoxidized. If deoxidation is incomplete, the substrate temperature will continue to rise. Once deoxidation is complete, the current deoxidation temperature is obtained, signifying the end of the experiment. Preprocessing the raw RHEED video data yielded 320,000 NPY files for training.

Figure 1. Overall framework of the experiment.

Compared to other convolution-based models, CNN-ViT incorporates the Transformer's self-attention mechanism, which enhances the model's ability to capture global information from images. This reduces dependence on a large number of parameters while ensuring robustness. Traditional convolutional architectures can be limited by receptive fields when processing global information and often require many parameters to capture a wide range of global information. In addition, ViT is unaffected by input position and is more flexible for images of different sizes. The authors' model also includes an up-sampling layer before the convolution layer, as shown in Figure 2a, which increases adaptability to inputs of various pixel sizes without changing the model structure. Raw data processed in the manner of Figure 2a is standardized into a fixed-size block matrix. This sequence is then input to a transformer encoder where each layer encompasses a multi-head attention mechanism and a feedforward neural network, as shown in Figure 2b. Finally, the results processed by the feed-forward neural network are integrated and sequentially output via a multi-layer perceptron (MLP) layer, a GELU layer, and another MLP layer. In Figure 2c, the input sequence of the multi-head attention mechanism is linearly transformed and divided into multiple subspaces, each called an attention head, such as value (V), key (K), and query (Q). Each attention head has a weight matrix for computing the attention distribution. Through data processing, the output of each attention head is integrated and subsequently linearly transformed to yield the final output of the multi-head attention mechanism.

Tests were conducted to evaluate the accuracy of the model with varying parameters and input sizes. As shown in Figure 2d, as the depth and number of heads were gradually increased, verification accuracy showed an upward trend and verification loss showed a downward trend. However, setting the depth and heads above 16 did not significantly improve the verification accuracy, nor did it significantly decrease the verification loss. This suggests that further increasing these parameters only increases model complexity without effectively improving model performance.

Furthermore, as shown in Figure 2e, as the number of images increases, the model's performance also shows an upward trend. However, when the number of images exceeds 12, the model's accuracy decreases. This phenomenon may be due to the complexity of the parameters, which requires more epochs for the model to achieve better results. Nevertheless, we find that the best balance between training time and accuracy is to select every 12 images as input for the model. This approach was chosen because with the board rotating at 20 revolutions per minute and a frame sampling rate of 8 frames per second, there are 24 frames of RHEED data collected during one board revolution. Of these 24 frames, 12 frames contain duplicate information; by inputting 12 frames of images each time, duplicate data collected by RHEED is effectively avoided, preventing data redundancy and improving model data processing efficiency. Finally, the number of pixels in each image was adjusted to study changes in model accuracy. After training these models for 100 epochs, we observed a gradual improvement in model performance as the number of pixels in the image increased, as shown in Figure 2f. However, once the resolution exceeds 64, the improvement in model accuracy is limited. For an input pixel size of 128, the model introduced into the program can generate about 9 results per second, which is very close to the sampling rate of the camera and fully exploits the RHEED data collected by the camera. Furthermore, considering that richer input information improves the accuracy of the model, we finally decided to set the input pixel size for each image to 128. As a result of the above, after determining the model structure and sufficient training, the model validation accuracy reached 99.95% with an average validation loss of only 0.001646399.

Figure 2. Structure of the CNN-ViT model: a) Schematic diagram showing convolutional data processing with up-sampling. b) Architecture of ViT. c) Multi-head attention mechanism. d) Variation of model validation accuracy and validation loss for different heads and depths, (e) number of images, and (f) image pixels.

As shown in Figure 3a, a typical deacidified data set was selected to analyze the features learned by the CNN-ViT model. First, a slight perturbation was given to the original data to generate a corresponding hostile sample and observe the model's response. As shown in Figure 3b, the generated samples were nearly identical to the original images, indicating that the model had good robustness. The model then visualized the regions of interest in the input data and generated an attention heatmap, as shown in Figure 3c. Figure 3a is annotated based on the grid partitioning method, highlighting the regions of interest within the attention heatmap. Attention regions within the heatmap are observed to be concentrated near the RHEED specular spots, which is consistent with the decision process of an experienced process engineer based on the luminance difference between the specular spots and the surrounding background, indicating strong interpretability. We then randomly selected five oxidation and five de-oxidation data sets and used a t-distributed stochastic neighborhood embedding (t-SNE) algorithm to map the higher-dimensional features of the model into a two-dimensional space, forming the scatterplot shown in Figure 3d. The scatterplot clearly separates into two categories with distinct boundaries, as indicated by the red dashed box in Figure 3d, suggesting that the model has excellent sensitivity in determining the deacidification state of the substrate. Additionally, the activation values for each feature map on the training set were averaged and plotted as curves, as shown in Figure 3e. Three typical convolution kernel and convolution layer output feature maps were selected from Figure 3e, and their parameters were visualized in Figures 3f and 3g. The change in weights of the convolution kernel and convolution layer outputs near the RHEED specular spots was significant, again demonstrating the consistency of the model with human identification methods The results are shown in Fig. 3h. We then attempted to analyze data collected from other MBE instruments for the model, as shown in Figure 3h. As shown in Figure 3h, we observed that the probability of identifying deoxidation was high, averaging 96.3%, even when the model was applied to substrates with different rotation speeds and to non-dataset GaSb substrate data. The probability of identifying oxidation was also high at 88.9%, indicating that the model is highly versatile without the need for new training.

Figure 3. Model feature analysis: a) Visualization of original image b) Generation of adversary samples c) Attention heatmap d) t-SNE visualization of high-dimensional features e) Mean activation curve f) Visualization of convolution kernel g) Visualization of feature map after convolution h) Model processing results for data on non-dataset.

In addition, deoxidation is generally intended to be performed at the lowest possible temperature to minimize non-stoichiometric effects due to inconsistent evaporation of atoms from the substrate surface. Therefore, when substrate oxidation is detected, the program raises the substrate temperature above the current level. The model then maintains this temperature for a period of time before making a secondary determination and repeating this process until it determines that the substrate has been deoxidized. Further, the criterion for whether the substrate has been deoxidized is established when 95% of the model's 24 consecutive determinations yield a deoxidation result. This avoids the situation where an accurate deoxidation temperature cannot be recognized due to the non-uniform thickness of the oxide film on the substrate surface.

Results and Discussion

The program was used to perform automated deoxidation experiments on GaAs substrates. The program operation phases were divided based on changes to the "reminder information" in the program interface, as depicted in Figure 4a. As depicted in Figure 4b, 11 heating cycles were performed for deoxidation throughout the program operation, starting at 350°C and reaching 405°C. After each heating, the substrate was held at this temperature for 6 minutes; before the 6 minutes were up, the RHEED shutter was opened and the model was used to make a series of determinations. The model determination results were plotted using scatter plots, and a moving average method was used for statistical analysis of the determination results (Figure 4c). At the deoxidization temperature, the probability of the model output being "Yes" increased sharply, while at other temperature points, the model output results mainly indicated "No." The program continuously counted the 24 output results of the model and considered the current temperature as the deoxidation temperature only when the probability of the model output being "Yes" exceeded 95%, indicating that the RHEED obtained from any angle of the substrate could be determined to be deoxidized, as shown in Figure 4d. Typical RHEED images at 350 °C, 400 °C, and 405 °C were selected as shown in Figures 4e-4g. The RHEED images at 350 °C and 400 °C showed weak brightness and relatively blurred bright spot features. However, the RHEED pattern obtained at 405 °C showed a clear and well-defined light spot outline, indicating that the substrate was successfully deoxidized at 405 °C. This program enabled automated deoxidation of GaAs substrates. In addition, similar experiments were successfully performed on Ge and InAs substrates.

Figure 4. a) Partition of the execution phase of the program. b) Temperature curves of the substrate. c) Output results of the model and the statistical results of the moving average method. d)Program determines if the substrate is in deoxidation phase. e)RHEED captured at 350°C. f)RHEED captured at 400°C. g)RHEED captured at 405°C.

In addition, a portion of the RHEED data of deoxidation on GaSb substrates from the Riber C21 MBE system was recorded at a substrate rotation speed of 30 rps and a camera sampling rate of 8 fps and input into the model to process the data. The output results of the model are shown in Figure 5a; it can be observed that from the 400th sequence to the 800th sequence, the probability of the model outputting "Yes" gradually increases and the moving average statistical curve steadily increases. Typical RHEED was achieved from the 400th, 600th, and 800th sequences, as shown in Figures 5b-5d. In the 400th sequence, there is very little pattern in the RHEED pattern, indicating that deacidification has not occurred. In the 600th sequence, however, there is a sharp main light spot in the RHEED, but the surrounding light spots are inconspicuous, indicating that the substrate is gradually approaching deacidification. in the 800th sequence, the specular spots in the RHEED pattern not only become sharper, but also more prominent in brightness. The surrounding small spots also become progressively more prominent, indicating that the substrate is approaching a deoxidized state. This data result confirms the model's ability to identify deoxidation of unknown materials in other devices and its strong universality.

Figure 5. Model results for GaSb substrate deoxidation data. a) Model output results and moving average statistics. b) RHEED captured in sequence 400. c) RHEED captured in sequence 600. d) RHEED captured in sequence 800. e) RHEED captured in sequence 600. f) RHEED captured in sequence 800. g) RHEED captured in sequence 400. h) RHEED captured in sequence 600. i) RHEED captured in sequence 800.


In this report, we present a comprehensive study of automated substrate deoxidation using a CNN-ViT hybrid model. The model is trained on diverse datasets, including deoxidizing RHEED video data from GaAs, Ge, and InAs substrates, and is shown to be adaptable to a variety of resolutions and camera hardware. Detailed analysis of model parameters, attention mechanisms, and features highlighted its robustness and consistency with human empirical methods, demonstrating its strong interpretability. In addition, automated deoxidation experiments were performed in situ. The model accurately identifies the deoxidation state of GaAs, Ge, and InAs substrates during the dynamic substrate heating process and provides accurate deoxidation temperatures. The model has demonstrated considerable accuracy when processing GaSb substrate data from a variety of MBE instruments. The ubiquity of this model across a variety of equipment and substrates provides a way to facilitate the standardization process in the semiconductor manufacturing field, and it is hoped that a more universal standardized model will be developed in the future.

友安 昌幸 (Masayuki Tomoyasu) avatar
JDLA G certificate 2020#2, E certificate2021#1 Japan Society of Data Scientists, DS Certificate Japan Society for Innovation Fusion, DX Certification Expert Amiko Consulting LLC, CEO

If you have any suggestions for improvement of the content of the article,
please contact the AI-SCHOLAR editorial team through the contact form.

Contact Us