AWORLD: Efficient Learning Platform For Agent AI With A Distributed Framework

11/09/2025

3 main points
✔️ AWORLD speeds up experience generation in distributed environments by 14.6x and significantly improves learning efficiency
✔️ Reinforcement learning applied based on Qwen3-32B achieves accuracy exceeding GPT-4o in GAIA benchmarks
✔️ As an open source platform, "learning from practice" from agent construction to training Supporting AI development "learning from practice" from agent construction to training

AWorld: Orchestrating the Training Recipe for Agentic AI
written by Chengyue Yu, Siyuan Lu, Chenyi Zhuang, Dong Wang, Qintong Wu, Zongyue Li, Runsheng Gan, Chunfeng Wang, Siqi Hou, Gaochi Huang, Wenlong Yan, Lifeng Hong, Aohui Xue, Yanfeng Wang, Jinjie Gu, David Tsai, Tao Lin
(Submitted on 28 Aug 2025 (v1), last revised 1 Sep 2025 (this version, v2))
Comments: Published on arxiv.
Subjects: Artificial Intelligence (cs.AI)

The images used in this article are from the paper, the introductory slides, or were created based on them.

Overview

This research focuses on the "learn-by-doing" learning paradigm, which is considered essential for the development of agentic AI (Agentic AI).

While traditional LLMs have demonstrated high performance in many areas, they still face significant challenges in applying them to complex, multi-step, real-world tasks.
In particular, for advanced tasks such as the GAIA benchmark, the interaction between the model and the environment has been inefficient and the generation of empirical data has been a bottleneck.

The authors therefore proposed an open-source framework called AWORLD.
AWORLD leverages a distributed environment to perform agent-environment interactions on a large scale and efficiently, accelerating experience generation by 14.6 times compared to conventional methods.
This mechanism makes large-scale training using reinforcement learning feasible, and an agent based on Qwen3-32B outperformed GPT-4o in GAIA.

This research provides a foundation for practical and self-improvable agent AI through efficient experience generation and optimization of training recipes.

Proposed Methodology

The proposed AWORLD framework is a comprehensive foundation for the process of "learning from practice" in agent AI.
The design consists of four major components.

First, agent construction enables prompt design, tool selection, and agent collaboration.
Second, as a communication protocol, it ensures unified messaging between users and agents, agents and tools, and even between agents, for robust distributed execution.
Third, runtime state management ensures highly parallel execution and state consistency maintenance using Kubernetes to handle large and long-term tasks in a stable manner.
Finally, training orchestration integrates with RL frameworks (e.g., SWIFT and OpenRLHF) to efficiently collect rollout data and connect to training.

A particular feature of this system is that it greatly improves efficiency, especially in the exploratory phase, making large scale reinforcement learning, which has been difficult in the past, practically feasible.

Experiments

The authors verified the effectiveness of the proposed method with GAIA benchmarks.

First, they confirmed that an increase in the number of rollouts is directly related to performance improvement.
For example, for Claude-3.7-Sonnet and GPT-4o, increasing the number of trials doubled the success rate, indicating the importance of the quantity and quality of experience generation.

Next, a comparison of AWORLD's distributed environment with conventional single-node execution showed a 14.6-fold speedup, reducing rollout time from 7695 seconds to 525 seconds.
Furthermore, agents trained on AWORLD based on Qwen3-32B outperformed GAIAテストセットでpass@1精度32.23%を記録し and GPT-4o (27.91%) and were comparable to DeepSeek-V3.

The results exceeded those of existing commercial models, especially on a set of challenging problems, demonstrating that AWORLD is an effective foundation for raising the competence of agent AI in complex inference tasks.

Categories related to this article

nakata

AWORLD: Efficient Learning Platform For Agent AI With A Distributed Framework

Overview

Proposed Methodology

Experiments

MMR1: A Multimodal Inference Model That Stabilizes Reinforcement Learning With Sampling Based On Reward Variance

MMR1: A Multimodal Inference Model That Stabilizes Reinforcement Learning With Sampling Based On Rew ...

VCRL: A New Approach To LLM Reinforcement Learning That Controls Learning Difficulty With Reward Variance

VCRL: A New Approach To LLM Reinforcement Learning That Controls Learning Difficulty With Reward Var ...

The Challenge Of Social-MAE, A Social AI That Uses Self-supervised Learning To Decipher Emotions, Laughter, And Personality

The Challenge Of Social-MAE, A Social AI That Uses Self-supervised Learning To Decipher Emotions, La ...

OnGoal: New Chat Interface To Visualize The Goals Of LLM Dialogue

OnGoal: New Chat Interface To Visualize The Goals Of LLM Dialogue

TriMM: Collaborative Multimodal Coding For High-quality 3D Generation

TriMM: Collaborative Multimodal Coding For High-quality 3D Generation

Dress&Dance: Video Diffusion Model For Highly Accurate Virtual Fitting And Motion Generation

Dress&Dance: Video Diffusion Model For Highly Accurate Virtual Fitting And Motion Generation