Optimal Architecture Search Method In Federated Learning!
3 main points
✔️ Automate the design of NN architectures used in Federated Learning
✔️ Explore optimal architectures even when you can't look directly at the dataset
✔️ Beyond the performance of manually designed architectures
Towards Non-I.I.D. and Invisible Data with FedNAS: Federated Deep Learning via Neural Architecture Search
written by Chaoyang He, Murali Annavaram, Salman Avestimehr
(Submitted on 18 Apr 2020 (v1), last revised 4 Jan 2021 (this version, v4))
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC); Multiagent Systems (cs.MA); Machine Learning (stat.ML)
The images used in this article are from the paper, the introductory slides, or were created based on them.
first of all
In recent years, there has been a demand to train neural networks with privacy and even confidentiality in mind. For example, in a model that makes a diagnosis by inputting CT images, it is not possible to train a model by collecting data of patients in hospitals all over the world in one place for the sake of data privacy. Federated Learning is an approach to solving this problem. Using this approach, each hospital can train a model without disclosing the raw CT images, as shown in the figure below.
However, recent research has shown that using a predefined model architecture may not be optimal when training models using Federated Learning on non-identical and independently distributed data. Also, it is very difficult to manually design a better architecture because the modeler cannot see the distribution of the data.
So, as part of the automation of Federated Learning, the authors considered optimizing the architecture of neural networks in Federated Learning automatically. In this approach, each edge server, such as each hospital, searches for architectures and weights for the local data there and transfers them to the management server. The management server then averages them and forwards them to the edge servers. By repeating this procedure the optimal architecture is explored under Federated Learning restrictions. In the experimental part, we verify the effectiveness of the proposed method by comparing the architectures explored by the proposed method with manually designed architectures.
In Federated Learning, suppose there are K edge servers. Each edge server has a dataset Dk locally. When learning collaboratively among these K edge servers, the objective function is defined as follows
In this equation, w represents the weights of the network, α represents the architecture of the network, and l represents the loss function of the neural network. To minimize the above objective function, previous studies used the procedure of fixing the architecture and updating the weights of the network, changing the architecture based on the results, and updating the weights of the network again. However, in this paper, we propose to update the architecture itself as well as the network weights. In this case, the above objective function can be formulated as
There are three components of a NAS that need to be considered
- Definition of search space
- search algorithm
- Performance Estimation Method
In this paper, the search space is an existing search space defined in DARTS and MiLeNAS. This search space is shown in the figure below.
Since the search space of neural network architecture becomes excessively large when skip connections are included, we often define the architecture in units of cells as shown in the above figure, and search the combination of cells. In this paper, we also use this technique to search the cell-based search space.
In the above search space, each edge server searches for the optimal weights and architecture in the local data for a few epochs using MiLeNAS as shown in the equation below.
Federated Neural Architecture Search
The authors considered Federated Neural Architecture Search with the following steps.
- Local search (learn the data there at each edge server, explore the architecture)
- Each edge server sends weight w and architecture α to the central server
- The central server aggregates these
- The aggregated result is sent to each edge server, and each edge server updates the value
This procedure is repeated to search for the optimal architecture.
The aggregation of the results of each edge server at the central server is done as follows
As you can see from the formula, as an aggregate it simply averages the results obtained on each edge server.
We prepared 16 machines with GPU (RTX2080ti ) as edge servers.
The dataset was CIFAR10 with 600000 images disproportionately divided into 16 edge servers and placed on each edge server.
The above figure shows the variation of test accuracy on non-IID data. (a) is the result of the manually designed model and (b) is the result of the model designed by the proposed method. This figure shows that the model designed by the proposed method achieves higher accuracy. It also shows that the improvement of accuracy is stable from round to round.
Next, we evaluated the efficiency of the proposed method. The results are shown in the table below.
As we can see from the table, the proposed method (FedNAS) has a smaller search time and a more compact model design.
In this paper, we proposed FedNAS, a method for automatically exploring neural network architectures under the Federated Learning restriction. We found that the architecture explored by this method can design more accurate models than manually designed models, and the exploration time is faster.
Categories related to this article