Catch up on the latest AI articles

Comparison Of Architectural Evaluation Methods In NAS!

Comparison Of Architectural Evaluation Methods In NAS!


3 main points
✔️ A paper comparing methods for evaluating explored architectures in neural network architecture search (NAS)
✔️ Investigate which algorithm performs best for each of several time limits for algorithm execution
✔️ Proposed a method called OMNI, which combines the best performing algorithms and shows even higher performance

How Powerful are Performance Predictors in Neural Architecture Search?
written by Colin WhiteArber ZelaBinxin RuYang LiuFrank Hutter
(Submitted on 2 Apr 2021 (v1), last revised 27 Oct 2021 (this version, v2))
NeurIPS 2021
Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)


The images used in this article are from the paper, the introductory slides, or were created based on them.

first of all

Recently, Neural Architecture Search (NAS), a method to automatically search the structure of neural networks, has attracted much attention. Although NAS is very useful because it can automatically search architectures, the search process is computationally expensive, so various studies have focused on how to reduce the computational cost. In particular, the evaluation stage of candidate architectures is a bottleneck, so there is a need to reduce the computational cost of this part.

Many methods have been proposed to evaluate candidate architectures. These methods can be categorized into several families, and each method has been compared only within the family. In this study, we compared the methods not only within the families.

research background

What is NAS?

NAS (Neural Architecture Search) is a method to automatically search the structure of neural networks. One of the most famous is Zoph&Le (2017)( In this method, a candidate network is generated by a generator using RNNs and is trained to evaluate its architecture. Based on the evaluation value, the next candidate network is explored by reinforcement learning. This method is an epoch-making method that can automatically search the architecture of the neural network, but because the evaluation of the architecture of the candidate is done by actually learning it, the computation cost is very heavy, and it was a method that requires a considerable amount of computer resources. Therefore, various researches have been conducted to speed up this method.

Summarizing those studies, we can classify each method into several groups. The groups include.

  1. model-based method
  2. Learning curve-based methods
  3. hybrid method
  4. zero-cost approach
  5. weight-sharing architecture

model-based method

The model-based method speeds up the evaluation of explored architectures by using a predictor that takes the structure of the model as input and predicts its performance on that architecture. Note that this method requires a dataset (a pair of candidate architectures and their performance at that time) to predict performance, and time to prepare this predictor.

Learning curve-based methods

Learning curve-based methods are methods that train the explored architecture and evaluate it by looking at the changes in the learning curve. The most obvious and typical method is early termination. Unlike model-based methods, this method does not require a dataset or learning time in advance, but it should be noted that it requires a long time to evaluate the explored architecture because it needs to be trained every time with the explored architecture.

hybrid method

This method combines both model-based and learning curve-based methods. Because it combines both, it has the disadvantage of requiring more datasets, more training time, and more time to evaluate the explored architectures, but it is a method that can evaluate the performance of architecture with very high accuracy.

zero-cost approach

Zero-cost methods are those that require no prior training time, no dataset, and little or no time to evaluate the explored architecture. Specific methods include NASWOT.

weight-sharing architecture

This method prepares a very large neural net that has already been trained on the target task and extracts a part of it to search for architectures. Since it has already been trained as a large neural network, the cost of training the extracted architecture is low, and evaluation is fast. One-shot NAS is a well-known method.

In this paper, we compare the methods of each of these groups on a common metric and evaluate their performance.


We will now show the results of comparing each of these methods.

The trade-off between initialization time and query time

To compare each method, we need to understand the trade-off between initialization time and query time. Initialization time is the time required in advance before the phase of actually exploring the architecture, and the query time is the time required during the phase of actually exploring the architecture. Specifically, in the model-based approach, initialization time corresponds to the time required to train a performance predictor to evaluate the explored architecture. The query time corresponds to the time required to evaluate the explored architecture with this performance predictor. Thus, it can be seen that the initialization time is very expensive for model-based methods, while the query time is of small cost. On the other hand, learning curve-based methods do not require a pre-training phase and therefore do not require any initialization time. However, the query time is very expensive because each time we evaluate each architecture, we need to train that architecture.

Thus, there is usually a trade-off between initialization time and query time. Therefore, if you have many candidate architectures to explore, you should choose an algorithm with a small query time, and if you do not have many computing resources and want to explore easily, you should choose an algorithm with a small initialization time. Thus, the algorithm that should be selected depends on the user's scene.

experimental setup

Because of the tradeoffs described above, we have prepared 11 different costs that can be spent on initialization time and 14 different costs that can be spent on query time, for a We have prepared a total of 154 combinations. For each of these cost combinations, we verify the algorithm that shows the highest performance.

Here, we use rank correlation as a measure of the algorithm's performance. The rank correlation is a measure of how similar two rank tables are given. We evaluate the performance of each architecture explored by each algorithm and compare the rank order table and the actual rank order table of that architecture (obtained by actually training it to the end) by rank order correlation to verify how correctly we evaluated the performance of the architecture. Specifically, Kendall tau, Pearson correlation, and Spearman correlation are used as rank order correlations.


In the figure below, we compare the relationship between initialization time and query time and the Kendall score at that time on NAS-Bench-201 in CIFAR-10 for each score.

The figure shows that out of the 154 different cost settings, there are only 7 algorithms that we're able to achieve the highest score for any one of the methods. This can be seen more clearly in the figure below.

This figure is color-coded by the algorithm with the highest score on the combination of initialization time and query time for each dataset. We will look at this figure to see which method is better for each time budget in turn.

First, we look at low initialization time and low query time. We see that SynFlow (gray) and Jacobian Covariance (blue) perform relatively well at this time. However, we can see from the bottom right figure that these zero-cost methods do not perform well on datasets with large search spaces, such as DARTS.

Next, we look at low initialization time and high query time. In this region, SoTL-E (red) shows consistently high performance.

At high initialization time and low query time, GCN (purple) and SemiNAS (black) show better performance. However, for larger initialization time budgets, the boosting tree (brown) performed better. For model-based methods, the initialization time corresponds to the training time of the performance predictor, so this result shows that it works well when there is enough performance data. What is most interesting in this regard is that in NAS-bench-101/201, SynFlow and Jacobian covariance each performed better than model-based methods given an initialization time of 30 hours with a computation time of 3 seconds. than the model-based methods given an initialization time of 30 hours with a computation time of 3 seconds each. This may indicate that NAS-bench-101/201 does not have as much data as the model-based methods require.

The fact obtained from these results is that there are algorithms that show high performance, but each of them is specialized for each initialization time and query time, and there is no algorithm that shows high performance in a general-purpose manner. Here, the authors thought that it would be possible to develop a more general-purpose high-performance method by combining algorithms that show high performance in each of these domains (time cost). Therefore, the authors consider combining the best learning curve method (SoLT-E) and the best zero-cost method (Jacobian covariance), which were identified in this experiment, by adding them to the features of the model-based method. This method is named OMNI by the authors; the performance of OMNI is shown in the figure below.

This figure visualizes the percentage performance of OMNI compared to the best performing algorithm in each experimental setting, excluding OMNI. The light-colored areas indicate where OMNI performs very well, while the dark-colored areas indicate where the existing methods perform better. This shows that OMNI performs well over a wide range of both methods, which confirms that OMNI is a more general-purpose method.

In this study, the authors chose the method of adding other methods to the features of the model-based method as a method of combination, but we believe that there must be more sophisticated methods of combination. Therefore, we argue that research in the direction of considering such combinations may be promising in the future.


In this paper, we compared different groups of NAS methods using a common index and verified which method is the best method for each time budget. We also proposed OMNI, which is a more general method for each time budget, by combining the superior methods found from the results. I thought the paper was very interesting in that it not only compared existing NAS methods with a common index but also showed a new research direction for combining NAS methods.

If you have any suggestions for improvement of the content of the article,
please contact the AI-SCHOLAR editorial team through the contact form.

Contact Us