A Method For Comparing Neural Network Architectures Using Contrastive Learning And Convolutional Graph Networks.

NAS 10/08/2022

3 main points
✔️ Compare the performance of architectures using contrast learning and convolutional graph networks
✔️ Compute rewards for reinforcement learning agents based on comparisons for stable inference
✔️ Successfully explored architectures with higher accuracy than existing methods at less cost

Contrastive Neural Architecture Search with Neural Architecture Comparators
written by Yaofo Chen, Yong Guo, Qi Chen, Minli Li, Wei Zeng, Yaowei Wang, Mingkui Tan
(Submitted on 8 Mar 2021 (v1), last revised 6 Apr 2021 (this version, v2))
Comments:Accpeted by CVPR 2021.
Subjects: Computer Vision and Pattern Recognition (cs.CV)

code：

The images used in this article are from the paper, the introductory slides, or were created based on them.

first of all

Deep neural networks have achieved great success in many different fields. However, designing an effective neural network architecture requires a lot of effort and relies heavily on human expertise.

Besides designing hand-made architectures, there is another way to design them automatically. We call that method NAS (Neural Architecture Search), which aims to design architectures with higher performance than hand-made architectures. In existing NAS, the key challenge is how to estimate the performance of the explored architecture. This is because it is very time-consuming and computationally expensive to train the explored architectures on a dataset without devising a new one each time.

Some of the innovations that have been done in this regard are methods that share weights by learning a supernet (a large parent network) and searching for architectures in it, and methods that estimate the performance of an architecture using a predictor. However, learning with random seed values may cause large performance fluctuations, making it difficult to accurately predict the performance.

In this paper, we propose a Contrastive Neural Architecture Search (CTNAS) method, which performs the search by comparing architectures. By evaluating the explored architectures in the form of architecture comparison, the evaluation accuracy can be improved because a complete architecture performance evaluation is not required. We also improve the accuracy and stability by using contrastive learning and convolutional graph networks.

Overview of the proposed method (CTNAS)

The proposed method (CTNAS) is summarized in the following figure

The architecture explored by the Controller (Sampled Architecture α) is compared with the baseline by NAC to compute the Controller's reward.

Contrastive Neural Architecture Search

The proposed method aims to obtain a ranking of candidate architectures using comparisons without using absolute performance. We name the module for this purpose NAC.

NAC is a module that compares any two architectures α, α' and outputs the probability that α is better than α'. That is, it outputs the probability p such that

The Controller learns as a reward the probability p given by the NAC. Specifically, it learns the policy π(α;θ) by solving the optimization problem below.

In the above equation, β represents the architecture of the baseline model. In comparing architectures, a baseline is set and the expected probability of exceeding that baseline is set as the reward.

Update baselines for comparison

Since CTNAS rewards the results of comparisons with a baseline architecture, the search performance is highly dependent on the architecture set as the baseline. Therefore, if the baseline architecture is fixed, CTNAS will only search for architectures that are better than the baseline, and it will be difficult to search for architectures with sufficiently high performance. Therefore, it is necessary to improve the baseline during the search. In this section, we describe the baseline update algorithm for this purpose.

To improve the baseline architecture, we aim to set the best architecture as the baseline among the architectures explored in the past. Specifically, we construct a candidate baseline set H and dynamically incorporate the architectures sampled by the Controller. By comparing any architecture _αi with the other architectures in H, we compute the average comparison probability according to the equation below.

Using the best previous architectures as a baseline, we aim to improve the search performance by exploring even better architectures than the baseline.

Neural Architecture Comparator (NAC)

In this section, we have explained the basic concept of the proposed method. From now on, we will explain the details of NAC, which is the key module of the proposed method.

As mentioned above, the function of NAC is to input two architectures and output the probability that the explored architecture is better than the baseline. How exactly do you input architectures?

Architecture comparison by GCN

In this paper, we use GCN (Graph Convolutional Networks) as a method to input architecture to the model. We use this method because the structure of a neural network can be considered as a graph and it is a supernet-based architecture search method.

In GCN, architecture is entered by representing it as shown in the figure below.

It represents each node as a DAG (Directed Acyclic Graph), and the Adjacent Matrix represents the connection state of each node. In addition, Nodes Attributes are prepared for each node to have its features.

Since we use a two-layer GCN in this paper, the features just before the input to all the coupled layers of architecture α can be calculated as follows.

The features calculated in this way are input to all the coupling layers and the output is the probability that the performance of the explored architecture is better than the baseline, as shown in the equation below.

To train NAC, we need a dataset consisting of architectures and their performance evaluated on some datasets such as CIFAR-10. The architecture and its performance pairs can be obtained by training a supernet or by training a set of architectures from scratch. For every architecture in this dataset, we create pairs for all patterns and assign a label to each as to which is better. In this way, NAC training can be thought of as a binary classification problem.

Data exploration for NAC learning

To learn a good NAC, we need as many pair data of architectures and their estimation accuracy as possible. However, in practice, only a limited number of paired data sets are available due to computational cost limitations. With a limited training data set, the performance of the NAC model may be degraded, and it is assumed that learning for architecture exploration will be difficult.

To solve this problem, this paper proposes a data exploration method that adopts the architecture sampled during the search as unlabeled data. In this method, the class with the maximum probability predicted by NAC is adopted as a label for the unlabeled data. That is, when NAC compares the explored architecture A with the baseline B, if NAC predicts that A is better than B, then A > B is assigned as the label for this pair. Specifically, we label them as shown in the equation below.

However, it is conceivable that the predicted labels may be noisy, since the NAC may make incorrect predictions. Therefore, we evaluate the prediction quality by computing a confidence score for the architecture pair; the confidence score for the kth architecture pair can be computed as follows

We select the data with predictive labels with the top K confidence scores and combine them with the labeled data to train NAC. To balance the two types of data, the originally labeled data, and the data retrofitted by NAC, we set the ratio of predictive label data to be 0.5.

We believe that by increasing the unlabeled data, NAC can improve its flourishing performance for unknown architectures.

The above is the explanation of NAC. Finally, as a summary, we list the advantages of NAC over existing methods.

Instead of estimating absolute performance, comparisons can be used to provide more stable rewards.
It works faster than evaluating performance on verification data because it evaluates architectures by comparing architectural graphs.
NAC can prepare m(m-1)/2 training pairs when m architectures and their estimated accuracy pair data are prepared, but the method of directly estimating accuracy can prepare only m training data. Therefore, NAC can reduce the number of training samples required.

experiment

From here, we will see the results of the validation of the proposed method.

Experiments on NAS-Bench-101

First, we examine the performance of architecture comparison by NAC. To compare the performance of architectural comparison, we consider the rank correlation. Rank correlation is a correlation between rankings, and we verify the accuracy of NAC by measuring how much the Ground-Truth rankings and the rankings predicted by NAC are correlated.

The result of the comparison is shown in the figure below.

The results show that the proposed method has the highest rank correlation (KTau). It also shows that the highest average accuracy is also achieved.

Experiments on Image Net

Next, we compare the performance of the proposed method on Image Net with other NAS methods. The results are shown in the table below.

The table shows that the performance of the explored architectures is high compared to other methods, while the exploration time is very short.

Performance verification of each NAC module

The time cost of architecture evaluation

The comparison of the time cost of architecture evaluation with other NAS methods is shown in the equation below.

This shows that the proposed method (CTNAS) can perform architecture evaluation in a very short time.

Effects of Baseline Update

To verify the effect of baseline update, we compare three cases: the case of running CTNAS with a fixed baseline, the case of running CTNAS with a randomly selected baseline, and the case of the proposed method. The results are as follows.

It can be seen that the variance of the accuracy is large when the baseline is randomly selected. Also, when we compare the fixed baseline and the proposed method, we can see that the performance is improved to some extent, although there is not a big difference.

Effects of data exploration methods

To verify the effectiveness of the data exploration method, we compare the results with and without data exploration. The proportion of data by data exploration is r, and the result is checked by varying r. The result is as follows.

This table shows that when no data exploration is done (r=0), the accuracy is low, and when too much reliance is placed on exploration (r=0.8), the variance is high. From this result, we can see that r=0.5 is appropriate for the proportion of data.

summary

In this paper, we proposed Contrastive Neural Architecture Search, which performs a search based on the comparison of sampled architectures and baselines. I thought it was very interesting to see the architecture evaluation based on the comparison and the input of architectures using graph networks.