Catch up on the latest AI articles

MATE: Multi-agent Accessibility-specific Modality Transformation Framework

3 main points
✔️ Proposed "MATE," an open source multi-agent system specialized for modality conversion to assist people with disabilities
✔️ Developed ModConTT and BERT fine-tuning models for modality conversion task classification datasets
✔️ Proposed models achieved higher accuracy than existing LLM and ML methods The proposed model outperforms existing LLM and ML methods and has potential for application in a wide range of fields.

MATE: LLM-Powered Multi-Agent Translation Environment for Accessibility Applications
written by Aleksandr AlgazinovMatt LaingPaul Laban
(Submitted on 24 Jun 2025 (v1), last revised 15 Jul 2025 (this version, v2))
Comments: Published on arxiv.
Subjects:  Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

The images used in this article are from the paper, the introductory slides, or were created based on them.

Summary

In this study, an open-source support framework called MATE (Multi-Agent Translation Environment), which utilizes a multi-agent system (MAS), is proposed to address the lack of accessibility in the digital environment faced by users with disabilities MATE is an open-source support framework that utilizes a multi-agent system (MAS).

MATE performs translation between different modalities (text, speech, images, video, etc.) in response to user requests, making information easily accessible to people with visual or auditory limitations.
Features include a "ModCon-Task-Identifier" model that analyzes user input and automatically determines the most appropriate conversion task, enabling a variety of tasks such as text-to-speech (TTS), speech recognition (STT), image caption generation (ITT), and image-to-speech explanation (ITA).

In addition, a dedicated dataset "ModConTT" for modality conversion task classification was constructed and evaluated in comparison with existing LLM and machine learning models.
As a result, the proposed model works with high accuracy and low cost, and has shown potential for application in a wide range of domains such as medicine, education, and transportation.

Proposed Methodology

MATE consists of an "interpreter agent" that interprets user requests and seven types of "specialized agents" that perform specific conversion tasks.

The Interpreter Agent identifies the task type from the input sentence and assigns processing to the corresponding specialized agent.
Each agent leverages existing high-performance models (e.g., Whisper, Stable Diffusion, Tacotron 2, BLIP, etc.) to perform conversions such as TTS, STT, TTI (text to image), ITT (image to text), ITA (image to audio), ATI (audio to image), and VTT (video to text) conversions.
For task determination, the ModCon-Task-Identifier, a fine-tuned version of BERT using the ModConTT dataset created by the authors, was employed to achieve higher accuracy than generic LLMs and classical machine learning models.

The system is designed for local execution, offering privacy protection and flexible customization, making it suitable for real-time assistance in the medical and educational fields.

Experiments

In the experiments, we first compared several LLMs (GPT-3.5-Turbo, Llama-3.1-70B, and GLM-4-Flash) as interpreters using the ModConTT data set.

In task classification of 230 samples, GPT-3.5-Turbo showed high performance with an accuracy of 0.865, but the highest accuracy was achieved by ModCon-Task-Identifier with fine-tuned BERT, with an accuracy of 0.917 and F1 score of 0.916.
Furthermore, the superiority of the proposed model was confirmed by comparing it with other classical models such as logistic regression and random forests using TF-IDF and BERT embedding.

The misclassification analysis showed the highest failure rate in the UNK (unknown task) category, followed by STT and ATV.

These results demonstrate the effectiveness of the MAS+ specialized model in complex modality conversion tasks and support its high utility as a support tool in medicine and education.

  • メルマガ登録(ver
  • ライター
  • エンジニア_大募集!!

If you have any suggestions for improvement of the content of the article,
please contact the AI-SCHOLAR editorial team through the contact form.

Contact Us