A Framework For Simulating Collaboration Between AI Agents And Others In A Virtual Environment Is Now Available!
3 main points
✔️ Propose a new framework for using LLMs for multi-agent simulation
✔️ Conduct agent-to-agent or human-to-agent experiments using two benchmarks
✔️ Experimental results show LLMs' planning and communication capabilities in collaborative work Ability to plan and communicate in collaborative work
Building Cooperative Embodied Agents Modularly with Large Language Models
written by Hongxin Zhang, Weihua Du, Jiaming Shan, Qinhong Zhou, Yilun Du, Joshua B. Tenenbaum, Tianmin Shu, Chuang Gan
(Submitted on 5 Jul 2023)
Comments: Project page: this https URL
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
The images used in this article are from the paper, the introductory slides, or were created based on them.
Large Language Models (LLMs) have been suggested to understand natural language and acquire complex inference capabilities, showing remarkable performance in a variety of domains.
More recently, numerous experiments have been conducted in which a single agent generated using LLMs has been asked to simulate human behavior, demonstrating that it can serve as an excellent planner for complex tasks.
On the other hand, despite the fact that proper communication is necessary for generated agents to collaborate with other agents and humans, there has been no research designed for multi-agent or human collaboration to demonstrate these capabilities.
Against this background, this paperproposes a new framework for using LLMs in multi-agent simulations and describes the first systematic analysis ofLLMs'planning and communication abilities incollaborative work through experiments in a virtual environment.
Framework for cooperative agents with large-scale language models
In this paper, we proposed a new framework consisting of five modules: Observation Module, Belief Module, Communication Module, Reasoning Module, and Planning Module, in order to support multi-agent simulations in various virtual environments that have not been tested before. Module, Communication Module, Reasoning Module, and Planning Module.
An overview of the framework is shown in the figure below.
To enable cooperation between agents or between agents and humans, it is necessary to perceive and extract information about the surroundings from the virtual environment.
To make this possible, the framework incorporates the Observation Module as the first module to process information received from the virtual environment and extract information such as visual scene graphs, objects, maps of the virtual environment, and locations of other agents. The Observation Module is incorporated as the first module to process information received from the virtual environment and extract information such as visual scene graphs, objects, maps of the virtual environment, and locations of other agents.
Since LLMs do not have the ability to retain memories about information seen in the past or interactions with others, the Belif Module is incorporated as a module to effectively store and update information about physical information and the state of other agents.
In a multi-agent simulation, being able to communicate effectively with others is important, and to enable effective communication, two issues need to be solved : what to send as a message andwhen to send it.
The Communication Module addresses the above two problems by directly using LLM for message generation and designing prompts consisting of Instruction Head, Goal Description, State Description, Action History, and Dialogue History. The Communication Module addresses the above two issues by designing a
Using the information gathered by the previous modules, the agent must integrate the current state, the state of others and the surrounding environment, the goal of the task, the actions he/she has taken, and the messages received to come up with a plan for what to do next.
To create such a plan, this framework incorporates a Reasoning Module that uses prompts designed similarly to the CommunicationModule to reason about all information and generate a high-level plan.
For agents to accomplish complex tasks within a virtual environment, they must assemble a plan to tackle complex tasks as described above.
However, existing research had shown that LLMs tended to be good at making high-level plans, but poor at making low-level plans.
Therefore, in this module, the Planning Module was designed to generate low-level plans according to the high-level plans generated by the Reasoning Module.
In this paper, two benchmarks were set up to conduct multi-agent simulation experiments in a virtual environment with the proposed module.
Communicative Watch-And-Help (C-WAH)
Communicative Watch-And-Help (C-WAH) is a multi-agent simulation benchmark that extends the existing Watch-And-Help Challenge task for single agents.
The benchmark is built on VirtualHome-Social, a multi-agent simulation platform, and is defined as five common household tasks : preparing tea, washing dishes, preparing meals, preparing groceries, and preparing the dining table.
The two evaluation metrics are Average Steps, which is the average number of steps taken to complete a task, andEfficiency Improvement (EI), which calculates the improvement in task efficiency through cooperation with other agents.
ThreeDWorld Multi-Agent Transport (TDW-MAT)
The second is the ThreeDWorld Transport Challenge, an existing task for a single agent that adds many types and objects and containers, as shown in the figure below, to configure a more realistic arrangement of objects and support communication between agents in a multi ThreeDWorld Multi-Agent Transport (TDW-MAT), which has been extended to a configuration for agents.
The benchmark is built on a simulation platform called the TDW platform, where agents are asked to use containers to transport as many objects as possible to the goal position, which is set as shown in the figure below. (red = goal position, object = blue, container = green, agent = light blue, other agents = yellow)
The two evaluation indicators are the Transport Rate (TR), which calculates the percentage of objects transported to the goal position, and the Efficiency Improvement (EI) mentioned above.
The results of the two experiments are shown in the table below.
HP in the table represents agents designed to act on the basis of a simple hierarchical plan called Hierarchical Planner, and LLM represents agents designed using this framework.
As the table shows, in both experiments, the HP agents were able to complete the tasks more efficiently when they cooperated than when they executed the tasks alone , and the LLM agents achieved the highest performance when they cooperated with each other. This is confirmed by the fact that LLM agents are able to achieve the highest performance when they cooperate with each other.
In addition, to elucidate the essential elements of effective cooperative behavior among agents, this paper qualitatively analyzed the agents' behavior in the experiment and identified several cooperative behaviors, as shown in the figure below.
For example, in Figure a, the male agent (Bob) proposes a plan in which he goes to the kitchen while the female agent (Alice) checks the other rooms, but Alice proposes a better plan considering the situation that she is already in the kitchen.
The agents also understood the effectiveness of not daring to communicate, and in Figure c, when Bob shared the situation with Alice's suggestion and just found the object, a plate, he decided that it would be more efficient to complete the task with the object on his own, and he and Alice He chose not to communicate.
In addition, there was collaborative work between a female agent controlled by an actual human and a male agent using LLM, as shown in the figure below.
In this experiment, as with the agent-to-agent, the human and LLM agent communicate well and share the route to find the object, allowing them to complete the task efficiently.
Thus, in general, this experiment resulted in a great potential for building cooperative agents that can successfully collaborate with humans using LLMs.
How was it? In this article, we proposed a new framework for using LLMs in multi-agent simulations and described the first systematic analysis of LLMs' planning and communication abilities in collaborative work through experiments in a virtual environment.
Although the LLM agents used in this experiment were able to communicate appropriately with their surroundings and take correct actions in most situations, they still occasionally misunderstood the instructions given in the prompts or made incorrect inferences.
In order to resolve these defects, it is necessary not only to improve the framework but also to develop LLMs with enhanced prompt responsiveness and reasoning capabilities, and future progress will be closely watched.
The details of the framework and experimental results presented here can be found in this paper for those interested.
Categories related to this article