MetaGPT, A Multi-agent Framework In Which AI Consistently Develops Systems, Is Now Available!

Agent Simulation 13/09/2023

3 main points
✔️ Propose MetaGPT, a multi-agent framework that encodes SOPs incorporating real-world expertise into LLM agents
✔️ Experiments demonstrate that it can generate more consistent and comprehensive solutions compared to existing methods
✔️ Reduces the cost of traditional software engineering by a factor of 1000 or less

MetaGPT: Meta Programming for Multi-Agent Collaborative Framework
written by Sirui Hong, Xiawu Zheng, Jonathan Chen, Yuheng Cheng, Jinlin Wang, Ceyao Zhang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, Liyang Zhou, Chenyu Ran, Lingfeng Xiao, Chenglin Wu
(Submitted on 1 Aug 2023 (v1), last revised 17 Aug 2023 (this version, v4))
Comments: Published on arxiv.
Subjects: Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)

code：

The images used in this article are from the paper, the introductory slides, or were created based on them.

Introduction

While significant progress has been made in recent years in task resolution using multi-agents generated by large-scale language models (LLMs), existing research has focused mainly on simple tasks and has not investigated complex tasks that are prone to hallucination (the phenomenon in which an LLM outputs plausible lies that are not true). ) have not been investigated for complex tasks that are prone to hallucination (the phenomenon in which LLMs output plausible lies that are not true).

Such hallucination is more likely to occur as multiple intelligent agents interact, which has been a challenge when tackling complex problems.

To resolve these issues, the author of this paper turned to Standardized Operating Procedures (SOPs), which are widely used for human-to-human work.

SOPs are important procedures to support task decomposition and efficient coordination. For example, in software engineering, the waterfall model defines the phases of requirements analysis, system design, coding, testing, and deliverables.

This paper presents MetaGPT, a multi-agent framework that extends complex problem solving capabilities by encoding SOPs that incorporate real-world expertise into LLM agents, and shows through experiments that it can generate more consistent and comprehensive solutionsthan existing methods. The paper shows that MetaGPT can generate more consistent and comprehensive solutions compared to existing methods through experiments.

Framework Overview

The MetaGPT design is divided into two layers, the Foundational Components Layer and theCollaboration Layer, each of which is responsible for supporting system functions.

Foundational Components Layer

This layer defines in detail the core components, including Environment, Memory, Role, Action, and Tools, necessary for individual agent operation and system-wide information exchange, and develops the fundamental capabilities required for collaborative work.

The overall picture is shown in the figure below.

A description of each component follows.

Environment: Provides a collaborative workspace and communication platform for agents
Memory: Facilitates agents to store and retrieve past messages and context
Role: encapsulates specialized skills, behaviors, and workflows based on domain expertise
Action: Steps taken by the agent to accomplish a subtask and produce an output
Tools: A collection of utilities and services available to enhance the agent's capabilities

The MetaGPT framework generates agents for various Roles, such as ProductManager and Architect, which are initialized using Role-specific settings, for example, an engineer at a software company might

The Role configuration provided by the MetaGPT framework allows each party to create highly specialized LLM agents for specific domains and purposes.

Collaboration Layer

This layer, built on top of the Foundamental Components Layer, creates two mechanisms for collaboratively solving complex tasks: Knowledge Sharing and Encapsulating Workflows. mechanisms to solve complex tasks collaboratively.

Knowledge Sharing: This mechanism allows agents to efficiently exchange information and store, retrieve, and share data at various granularities. It not only improves collaboration, but also reduces redundant communication and improves overall operational efficiency.
Encapsulating Workflows: This mechanism uses SOPs to break down complex tasks into smaller, more manageable subtasks. These subtasks are assigned to appropriate agents and their performance is monitored through standardized output.

This separation of the Foundational Components Layer and the Collaboration Layer promotes modularity while ensuring both individual and collective capabilities of the agents.

Core Component Design

MetaGPT's approach is structured to facilitate processing by Role-specific expertise by breaking down high-level tasks into SOPs, which are detailed components processed by various Role agents (ProductManager, Architect, ProjectManager, Engineer, and Quality Assurance Engineer). The structure is designed to facilitate processing by Role-specific expertise.

The figure below provides a schematic of the software development process in the MetaGPT framework.

Initially, when MetaGPT receives a request from a human, the ProductManager agent initiates the process by performing a request analysis and a feasibility analysis.

The Architect agent then generates a detailed system blueprint showing the software architecture, as shown in the figure below.

The diagram includes definitions of important modules such as User, CollaborativeFilteringModel, and Recommender, as well as details on the fields and methods of each module.

On the other hand, system blueprints are not sufficient for engineers to implement complex system coding; they need additional details about how operations are performed within and across modules to translate the design into functional code.

Therefore, the Architect agent also creates a sequence flow diagram, as shown in the figure below, based on the system interface design, depicting the sequence of processes, objects involved, and messages exchanged among them that are necessary to execute the function.

Capturing details in this way facilitates the work of the Engineer agent and the ProductManager agent, which is responsible for detailed code design.

Finally, the Engineer Agent performs the actual code development and comprehensive testing is performed by the Quality Assurance Engineer.

Thus, it can be seen that the design and implementation of standardized outputs in MetaGPT is very effective in handling complex tasks, especially in clearly and safely expressing structural information that is difficult to convey using natural language alone.

Experiments

In this paper, we conducted experiments comparing MetaGPT with existing methods, AutoGPT and AgentVerse, to demonstrate the usefulness of MetaGPT in various scenarios such as game development, web development, and data analysis.

The table below shows the results of an initial comparison of the functions of each method.

As shown in the figure, MetaGPT stands out for its rich functionality and has a significant advantage over existing methods, especially in its ability to generate PRDs (product requirement specifications) and technical designs.

Comparative experiments were then conducted to evaluate the performance of the various frameworks of each method, using seven diverse tasks such as game generation, code generation, and simple data analysis.

The results are shown in the table below. (0=failed, 1=workable, 2=executed almost as expected, 3=complete success)

As can be read from the table, MetaGPT performed well on a diverse set of tasks, successfully executing on all but two. (The Flapy bird game and the Tank battle game were not successful due to experimental constraints and limited resources.)

In contrast, the existing methods AutoGPT and AgentVerse were not successful in any task, again demonstrating the usefulness of MetaGPT.

Finally, we performed a statistical analysis of MetaGPT in software development.

In the aforementioned experiments with various tasks, there were an average of 4.71 code files per task and an average of 42.99 lines of code per file.

Most notable in this experiment are the Cost statistics, which show that MetaGPT reduces the monetary cost of MetaGPT to less than 1/1000th ($1.09 on average) when compared to the cost of traditional software engineering.

Summary

How was it? In this article, we presented MetaGPT, a multi-agent framework that extends complex problem-solving capabilities by encoding SOPs that incorporate real-world expertise into LLM agents, and experiments have shown that it can generate more consistent and comprehensive solutions compared to existing methods. The paper showed that MetaGPT can generate more consistent and comprehensive solutions compared to existing methods through experiments.

While MetaGPT has tremendous potential for automating end-to-end processes, it also presents some problems, such as references to non-existent resource files such as images and audio, and calls to undefined or unimported classes and variables during the execution of complex tasks. The following are some of the problems that have been observed.

Since these phenomena are due to the hallucination inherent in large-scale language models, which the authors say can be handled by a clearer and more efficient workflow, we are very much looking forward to further progress.

The details of the dataset and baseline model architecture presented here can be found in this paper for those interested.

Categories related to this article

田中侑李

MetaGPT, A Multi-agent Framework In Which AI Consistently Develops Systems, Is Now Available!

Introduction

Framework Overview

Foundational Components Layer

Collaboration Layer

Core Component Design

Experiments

Summary

A Framework Is Now Available That Brings Out Performance Beyond That Of GPT-4 By Allowing Diverse Agents To Debate Each Other!

A Framework Is Now Available That Brings Out Performance Beyond That Of GPT-4 By Allowing Diverse Ag ...

MindAgent, A Framework That Enables Multi-agent Collaboration, Is Now Available!

MindAgent, A Framework That Enables Multi-agent Collaboration, Is Now Available!

A Multimodal Model Is Now Available That Enables Prediction Of Viewer Behavior From Video!

A Multimodal Model Is Now Available That Enables Prediction Of Viewer Behavior From Video!

ExpeL, An LLM Agent That Learns Autonomously From Experience, Is Now Available!

ExpeL, An LLM Agent That Learns Autonomously From Experience, Is Now Available!

A Multi-agent Framework Is Now Available For Any Task By Customizing The Workflow!

A Multi-agent Framework Is Now Available For Any Task By Customizing The Workflow!

AgentVerse, A Multi-agent Framework For Simulating Cooperative Human Behavioral Processes, Is Now Available!

AgentVerse, A Multi-agent Framework For Simulating Cooperative Human Behavioral Processes, Is Now Av ...