Catch up on the latest AI articles

FedNano: Lightweight And Efficient Distributed Learning Of Large-scale Multimodal Models

FedNano: Lightweight And Efficient Distributed Learning Of Large-scale Multimodal Models

3 main points
✔️ Proposed FedNano, a lightweight federated learning method for large-scale multimodal models
✔️ Trains only NanoAdapters on the client side, significantly reducing communication and computational costs
Reduction
✔️ Fisher Merging enables highly accurate aggregation even for non-uniform data distribution

FedNano: Toward Lightweight Federated Tuning for Pretrained Multimodal Large Language Models
written by Yao ZhangHewei GaoHaokun ChenWeiguo LiYunpu MaVolker Tresp
(Submitted on 12 Jun 2025)
Comments: 12 pages, 3 figures

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Multimedia (cs.MM)

code: 

The images used in this article are from the paper, the introductory slides, or were created based on them.

Overview

In recent years, MLLMs that can handle multiple modalities, such as image and language, have attracted a great deal of attention. While they perform well in advanced tasks such as cross-modal search and visual question answering, their large number of parameters makes them difficult to deploy on the terminal side and to operate in real-world scenarios where privacy protection is required. Federated Learning (FL) is a promising method for training models without centralizing distributed data, but there are many barriers to its application to MLLM, including computational resources, communication load, and non-IID data.

In this paper, a new FL framework, FedNano, is proposed to overcome these challenges: FedNano anchors a computationally intensive large-scale language model (LLM) on the server and performs adaptive processing on the client side using a lightweight module called NanoEdge. This design reduces the storage burden on the client by more than 95% and the parameters required for communication by less than 0.01%. Furthermore, the Fisher Merging technique provides high generalization performance even for non-uniform client data.

Proposed Methodology

The core of FedNano is a "server-centric LLM + client lightweight adaptation" architecture. The NanoEdge consists of modality-specific encoders, connectors, and a variable part called the NanoAdapter. Designed using a low-rank decomposition based on LoRA (Low-Rank Adaptation), the NanoAdapter allows for flexible task-specific adaptation while significantly reducing computation and communication.

In addition, FedNano applies Fisher Merging using the Fisher Information Matrix (FIM) when aggregating NanoAdapter updates collected from clients. This is a mechanism that effectively integrates information from a group of clients with statistically different data distributions by estimating the importance of each client's update information and weighting it accordingly. Thus, FedNano provides scalable and privacy-preserving federated learning of MLLM, both in terms of model structure and communication design.

Experiments

To validate the effectiveness of FedNano, the authors conducted experiments using two representative visual question answering (VQA) tasks, ScienceQA and IconQA. Advanced MLLMs such as MiniGPT-4 and LLaVA-1.5 were used for the evaluation, and the data was partitioned into 5 to 10 clients based on Dirichlet distributions to simulate a non-uniform data environment.

As a comparison, performance differences were tested against traditional FL methods such as FedAvg, FedProx, and FedDPA-F, as well as against centralized models (upper performance limit) and local fine-tuning (lower performance limit). The results showed that FedNano had the highest average accuracy in all settings, with excellent robustness, especially under conditions of strong data inhomogeneity. FedNano-EF, a simplified version of FIM, was also validated and showed a significant reduction in computational cost in exchange for a slight decrease in accuracy. Furthermore, FedNano's scalability and generalization performance were confirmed in settings with an increased number of clients and increased heterogeneity among tasks.

  • メルマガ登録(ver
  • ライター
  • エンジニア_大募集!!

If you have any suggestions for improvement of the content of the article,
please contact the AI-SCHOLAR editorial team through the contact form.

Contact Us