CHATDEV, A Virtual Company Of AI Agents Developing Software!
3 main points
✔️ Founded CHATDEV, a virtual software technology company with AI agents working in various positions
✔️ Proposed the chat chain, a framework that breaks down the software development process into multiple subtasks
✔️ Experiments have demonstrated CHATDEV's software development CHATDEV's efficiency and cost-effectiveness in the process has been demonstrated
Communicative Agents for Software Development
written by Chen Qian, Xin Cong, Cheng Yang, Weize Chen, Yusheng Su, Juyuan Xu, Zhiyuan Liu, Maosong Sun
(Submitted on 16 Jul 2023 (v1), last revised 18 Jul 2023 (this version, v2))
Comments: 25 pages, 9 figures, 2 tables
Subjects: Software Engineering (cs.SE); Computation and Language (cs.CL); Multiagent Systems (cs.MA)
The images used in this article are from the paper, the introductory slides, or were created based on them.
Large Language Models (LLMs) have shown remarkable performance in a wide range of tasks such as question answering, machine translation, and code generation, and recently there has been a lot of research into simulating human behavior by generating multiple AI agents.
Against this backdrop, and because both code and documentation, the core of software development, can be viewed as languages (i.e., sequences of letters), there has been a search for ways to use LLM to reduce costs in software development.
On the other hand, the following issues were raised in generating software using LLM
- As with question-answering tasks (where the LLM outputs a plausible lie that is not true), this can lead to incomplete function implementations, missing dependencies, potential bugs, etc.
- In software development, checks from others, such as code reviews and feedback on suggestions, are essential, and the absence of such third parties is a serious risk.
This paper describes the creation of CHATDEV, a virtual software technology company to address the above issues, and its development process based on mutual communication betweengenerative agents of various positions, such as programmers, test engineers, and art designers, to ensure consistent The paper will describe the paper that made software development possible.
As mentioned earlier, it has been pointed out that using LLMs to directly generate entire software systems, as well as the halcyonation seen in question-and-answer tasks, etc., can cause a variety of problems.
The authors of this paper believe that these problems are due to a lack of task specificity and a lack of mutual confirmation, and proposed the creation of CHATDEV, a virtual software technology company, as shown in the figure below.
CHATDEV consists of agents with diverse identities, such as programmers, test engineers, art designers, etc. When presented with a task (e.g., develop a five-in-a-row game), these agents develop a workable system, guidelines, user manuals effective communication and validation through chat to develop the necessary software, including a workable system, guidelines, and user manual.
CHATDEV employs the waterfall model, which is widely adopted as a software development life cycle model, and divides the software development process into four phases: designing, coding, testing, and documenting. In software development, effective communication among agents in multiple positions in each phase is essential.
To enable such smooth communication, this paper proposes an architecture called chat chain, which decomposes each phase of tasks into multiple smaller tasks.
The chat chain consists of two components, Phase-Level andChat-Level, as shown in the figure below.
Phase-Level uses a waterfall model to divide the software development process into four consecutive phases (designing, coding, testing, and documenting).
In Chat-Level, each phase is further broken down into chats called Atomic Ch ats, in which task-oriented role-playing takes place between two agents to facilitate collaborative communication.
The structure is such that as agents communicate to accomplish specific subtasks within each chat, the desired output is produced.
Next, we will explain each of the four Phase-Level phases: designing, coding, testing, and documenting.
The designing phase is the phase in which CHATDEV receives the first idea from the human client and consists of three functions: Role Specialization, Memory Stream, and Self-Reflection, as shown in the figure below.
In Role Specialization, agents are assigned to three positions: CEO (Chief Executive Officer) ,CPO (Chief Product Officer), andCTO (Chief Technology Officer). ), and have each agent assigned a position communicate in accordance with his/her designated position.
Memory Stream is a feature proposed in existing research and will be a mechanism to keep a record of the agent's previous communications and support subsequent decision making.
Self-Reflection becomes a mechanism to solve the problem of communication not being terminated despite a conclusion that satisfies a predefined termination condition (e.g., determining the appropriate programming language for this task).
In this mechanism, a Pseudo Questioner agent is created separately from the chatting agent, and the Pseudo Questioner records and summarizes the history of communication to date, and prompts the agent to terminate by asking questions that lead to a conclusion that satisfies the termination conditions. The Pseudo Questioner records and summarizes the history of communication to date, and asks the agent questions that lead to a conclusion that satisfies the termination conditions.
Based on these mechanisms, agents in each position discuss and decide on the detailed parts of the software, as shown in the figure below.
In the coding phase, agents are assigned to three positions: Chief Technology Officer ( CTO),programmer, and art designer.
The CTO then endorses the implementation of the software system using a markdown format to the programmer based on the design determined in the previous phase, and the programmer generates code in response.
To handle complex software systems, CHATDEV utilizes object-oriented programming languages such as Python, and programmers manage projects using Git-related commands.
Even for human programmers, it is rare that the initial code they write does not contain errors, and to address these errors, they must analyze and investigate the results of code execution and correct implementation errors.
Testing pahse assigns agents to three positions to perform these series of tasks: programmer,reviewer, and tester.
The process consists of two tasks: peer review andsystem testing. Peer review identifies potential problems by inspecting source code, while system testing verifies software execution through tests conducted by programmers using an interpreter.
After the designing,coding, andtestingphases,CHATDEV assigns agents to the following four positions: CEO (Chief Executive Officer) ,CPO (Chief Product Officer ) ,CTO ( Chief Technology Officer), and programmer. Chief Technology Officer), Chief Product Officer (CPO), Chief Technology Officer (CTO) , and programmer, as shown in the figure below, and have them create relevant documents such as software specifications and user manuals.
The CTO instructs the programmer about environment-dependent settings andhas theprogrammercreate documents such asrequirements.txt; similarly, the CEOinstructs on requirements and system design and has the CPOcreate user manuals.
The user manual contains comprehensive information about the software's technical architecture, installation procedures, and functionality, and the documentation and manuals allow human users to build their own environment and actually run the software using the appropriate interpreter. The documentation and manuals allow human users to build their own environment and actually run the software using the appropriate interpreter.
In this paper, experiments were conducted under the following conditions to demonstrate the effectiveness of software development with CHATDEV
- ChatGPT's gpt3.5-turbo-16k for large language models
- In the coding phase, up to 5 attempts are allowed before code completion
- In the Testing phase, up to 5 chats are allowed to propose modifications and test the software system
- For Python-based systems, use Python 3.8.16 as the interpreter for testing
Under these conditions, a statistical analysis of the software system generated by CHATDEV was conducted to gather comprehensive information on the development of CHATDEV, including file structure, code complexity, and version control.
The results of the analysis are shown in the figure below.
Of particular note in these results was the number of lines of code in the software developed by CHATDEV, which showed a tendency to create software with relatively small lines of code, typically between 39 and 359 lines, with an average of 131.26 lines.
This is likely due to the design of object-oriented programming languages, and one can infer that code reuse reduces redundancy.
In addition, the source code generated by CHATDEV tends to be shorter when the user specifies a less specific task, suggesting that giving the user a clearer, more specific task may encourage the user to generate code that is more in line with their requirements.
The table also shows that the software version is updated an average of 13.23 times, which indicates that the source code is modified an average of about 13 times, reflecting the fact that code modification is repeated throughout the software development processthrough mutual communication betweenagents. This reflects the fact that the code is repeatedly modified through mutual communication between agents throughout the software development process.
The figure below illustrates the development process of the experimental CHATDEV five-aligned game.
The leftmost screen represents software created without a GUI, which can only be played from a command terminal and has a rustic design, but the programmer's agent incorporated the GUI design and the designer added graphics to make it visually clear and more attractive. The game is visually clear and appealing.
In addition, CHATDEV has proven to be flexible enough to allow users to customize the software after it is completed, even if they are not satisfied with the designer agent's design.
How was it? In this article, we established CHATDEV, a virtual software technology company, and a paper that enables consistent software development through a development process based on mutual communication between generative agents in various positions, such as programmers, test engineers, art designers, etc. The following is a description of the project.
While the results of this experiment demonstrate the effectiveness of software development with CHATDEV, the following issues remain
- Large-scale language models have been found to have inherent biases, and there is a risk that the generated programmers will generate code patterns that do not match the thinking of human programmers
- There is a risk of malicious users abusing this framework
On the other hand, the integration of technologies from other fields, such as the possibility of optimizing strategies for solving tasks byincorporating reinforcement learning, may achieve further efficiency in software development, so future progress will be closely watched.
The details of the architecture and experimental results presented in this article can be found in this paper for those who are interested.
Categories related to this article