Fast NAS Method Using Batch Regularization
3 main points
✔️ Proposed a batch regularization-based metric for architecture evaluation to reduce the evaluation cost
✔️ Reduced learning cost by learning only batch regularization layer even for supernet
✔️ Achieve higher speedup in training and search phases without loss of accuracy
BN-NAS: Neural Architecture Search with Batch Normalization
written by Boyu Chen, Peixia Li, Baopu Li, Chen Lin, Chuming Li, Ming Sun, Junjie Yan, Wanli Ouyang
(Submitted on 16 Aug 2021)
Comments: ICCV 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV)
The images used in this article are from the paper, the introductory slides, or were created based on them.
first of all
Neural Architecture Search (NAS) is a general term for methods to search neural network architectures automatically. Recently, various types of research have been done and it has become possible to search architectures with high accuracy, but there are still many issues to be solved. One of the major issues is the cost of the search. The cost of evaluating candidate architectures in search is very high, so research is being done on reducing the cost. In this paper, we try to reduce the cost by using batch regularization parameters as a network evaluation metric and supernet learning.
It is a method to search a large supernet for a subnet that fits the desired task. The entire pipeline of the method can be divided into the following three stages
- Learning Supernet
- Subnet Search
- Subnet relearning
Supernets have multiple Operators at each layer as shown in the figure below.
Since we cannot learn the weights of all these candidate operations in a single error backpropagation, we sample the single-pass architecture (Op3 → Op2 → Op3 in the figure) based on the sampling policy at each iteration before training. Since we sample this at each iteration, the entire network is learned over the iterations.
Once the supernets are trained, the next step is to search for the best performing optimal architecture among them. We often use the accuracy in the validation data to evaluate the subnets.
In the re-training phase, the K most accurate subnets are re-trained in the subnet search phase. Then they are evaluated on the validation data and the most precise subnet is selected as the final optimal subnet.
batch regularization layer
The batch regularization layer is used for network pruning and is considered suitable for evaluating the importance of channels. Let Xin be the input to the batch regularization layer, the output Xout of the batch regularization layer
is calculated as follows. The parameters in the equation that are updated when the network is trained are β and γ.
The framework of the proposed method is outlined in the following figure
The method is based on the one-shot NAS. There are two differences
- In the learning phase of the supernet, the convolution parameters are fixed and only the parameters of the batch regularization layer are learned.
- In the subnet search phase, the proposed batch regularization-based metrics are used to find subnets
We now describe the batch regularization-based metrics used in the subnet search phase.
As a restatement, the following figure describes the batch regularization-based indicators.
In one-shot NAS, as mentioned earlier, we define an Operation (Op in the figure) in each layer, and the breakdown of the Operation is shown in the middle of the figure. In the proposed method, we only use the parameters of the last batch regularization layer in this Operation. Therefore, it is not possible to select an Operation that does not have a batch regularization layer at the end, but most of the existing NAS methods have a batch regularization layer at the end, so the proposed method can be applied.
The indicator using batch regularization consists of two parts. It is
- Evaluation of Operation by Batch Regularization Layer
- Evaluation of the architecture by 1.
The following is a list of the most important features of the system. These are explained in the following sections.
Evaluation of Operation by Batch Regularization
The evaluation value of the Operation of each layer is calculated as the following formula.
In this equation, Operation is evaluated using γ, which is a learnable parameter in batch regularization.
Evaluate the architecture using each Operation evaluation
Each Operation evaluation is obtained as in the formula above, and then they are combined to calculate the architecture's evaluation value. Specifically, it is calculated according to the following formula.
Comparison with the baseline method
As a baseline, we choose the SPOS and FairNAS methods. The comparison results are shown in the table below.
It can be seen that the proposed method can significantly reduce the computational cost in SPOS and FairNAS without any loss of accuracy.
Comparison with SOTA methods
The comparison results with the SOTA method are shown in the above table. It can be seen that the proposed method combined with SPOS achieves higher performance with less number of FLOPs when compared with the manually designed network. It can also be seen that the proposed method achieves higher performance with less number of FLOPs as compared to the NAS method in SOTA. As for the search cost, it can be seen that the proposed method requires less than 1/10th of the search cost compared to the method that directly searches the architecture.
To solve the problem of search cost, which is one of the major issues in NAS, the proposed method tries to reduce the search cost by using only the batch regularization layer as a learning and evaluation metric. Since the supernet is learned only by the batch regularization layer, the learning time of the supernet is shortened and the subnet can be efficiently explored by evaluating the subnet using the parameters of the batch regularization layer.
Categories related to this article