Catch up on the latest AI articles

Simulate The Behavior Of 25 AI Agents In A Virtual Space City!

Simulate The Behavior Of 25 AI Agents In A Virtual Space City!

Large Language Models

3 main points
✔️ Architecture simulating human behavior by using a large-scale language model and interactive agents
✔️ Constructed an artificial village society of 25 agents and a virtual game environment to simulate group behavior
✔️ Validation results show emergent inter-agent Confirms the formation of group dynamics

Generative Agents: Interactive Simulacra of Human Behavior
written by Joon Sung ParkJoseph C. O'BrienCarrie J. CaiMeredith Ringel MorrisPercy LiangMichael S. Bernstein
(Submitted on 7 Apr 2023)
Comments: Published on arxiv.

Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)


The images used in this article are from the paper, the introductory slides, or were created based on them.


Simulation of human behavior is an important research area in various fields, such as verification of community formation in virtual spaces, social robots, and development of more realistic game characters in open worlds, and has been the subject of numerous studies over the years.

However, existing research has the problem that agents can only simulate actions conditional only on the current environment (e.g., what actions the robot needs to take to enter a room), making it impossible to reproduce complex environments such as the real world.

This paper describes a paper that solves the above problem using an architecture consisting of three elements: Memory Stream, Reflection, and Planning, and successfully simulates collective behavior by constructing an artificial village society from the 25 agents generated by the architecture. The paper will describe the successful construction of an artificial village society out of 25 agents generated by this method, and simulate collective behavior.

Agent Avatar and Communication

To simulate collective behavior in this paper, we created a virtual environment called "Smallville" consisting of a small town and 25 agents who are its residents, based on the very popular RPG game, The Sims. (See figure below)

Smallville's virtual environment is built using Phaser, a web game development framework. The JSON data is updated by the output actions of the agents, and the process is processed for the next time step. In addition, to set up each agent's profile, such as its occupation and relationship with other agents, the following natural language processing description is input as the agent's initial memory.

In this sentence, the agent is given an initial memory of a man named John Lin, a pharmacy clerk who likes to help people. The agent thus created will spontaneously perform actions such as daily life and building relationships among agents in Smallville. In the above example of John Lin, we can see that he wakes up around 6:00 a.m., does his morning routine such as brushing his teeth, showering, and eating breakfast, exchanges a brief greeting with his wife Mei and son Eddy, and then goes to work, as shown in the figure below.

Another example is the verification by an agent named Isabella Rodrigues, which was done with the initial memory of planning a Valentine's Day party at a store called Hobbs Cafe on February 14 from 5:00-7:00 PM.

Isabella began inviting friends and clients to parties at Hobbs Cafe whenever she saw them, and on the afternoon of the 13th, it was confirmed that she was decorating the cafe for the day. Thus, on the party day, five agents including Isabella gathered at Hobbs Cafe, each enjoying the party as shown below. (IR is Isabella Rodrigues)

In this validation, only the initial memory of Isabella throwing a party was artificially set up, and all social behaviors such as spreading information to friends, decorating the café, and interacting with friends on the day of the party were emergent by the agent.

As seen in the two examples, it is noteworthy that these social behaviors are not pre-programmed, but rather emergent among agents from their initial memory.

Generative Agent Architecture

Underlying the agent behavior described above is a new agent architecture that combines a large-scale language model with a mechanism for synthesizing and retrieving relevant information that serves as the output condition for the language model, and without these architectures, agents will not be able to behave consistently based on experience in social, agents cannot behave in a consistent social manner based on experience. As shown in the figure below, the agent architecture in this paper consists of MemoryStream, Reflection, and Planning, and by utilizing these architectures and a large-scale language model (gpt3.5-turbo of ChatGPT is used in this paper) These architectures and a large-scale language model (ChatGPT's gpt3.5-turbo is used in this paper) enable the generation of agents that perform the aforementioned behaviors.

Let's look at them one by one.

Memory Stream

The primary role of the Memory Stream is to maintain a record of the agent's experience, which consists of a list of memory objects, each of which contains a natural language description, a timestamp of its creation, and a type stamp of its most recent access. (see figure below)

The most basic element of the Memory Stream is Observation, which is an event directly perceived by an agent. Observation in general includes actions that an agent has taken by itself or perceived as being taken by another agent. (e.g., rearranging chairs, studying for a test over coffee)

What an agent considers when deciding on an action is important, and in this context, the most effective action is determined by scoring from the three factors of recency, importance, and relevance.

recency assigns higher scores to recently accessed objects so that recent actions and events impact the agent's behavior more.

IMPORTANCE distinguishes between mundane and core objects by assigning higher scores to objects that the agent considers essential. For example, an everyday event such as eating breakfast in one's room has a low importance, while an event such as saying goodbye to an important person has a high importance.

RELEVANCE assigns higher scores to objects that are relevant to the current situation. For example, if the situation is that a student is discussing with his classmates what to study for a chemistry test, objects related to their breakfast will have a lower relevance, while objects related to the teacher and schoolwork will have a higher relevance.

The final score is determined by normalizing the recency, importance, and relevance scores to a range of [0, 1] by min-max scaling and then weighting the three factors together (weighted combination).


Reflection is a higher level, more abstract thought generated by the agent, and is generated periodically within the agent.

In the implementation in this paper, reflections were generated when the sum of the aforementioned importance scores exceeded a certain threshold for events recognized by the agent, and the agent generated approximately two to three reflections per day.

The first step in Reflection is to determine what the agent should Reflect, and to do so, the latest 100 events in the agent's Memory Stream (e.g., Klaus Mueller is reading a book about luxury residential areas) are queried in the Large Language Query the model.

Then, for the language model, "Given only the information above, what are the three most salient high-level questions we can answer about the subjects in the statements?" (What are the three most salient high-level questions we can answer about the subjects in the statements based on the information above? (Given only the information above, what are the 3 most salient high-level questions we can answer about the subjects in the statements?) The question is: "What are the three most salient high-level questions we can answer about the subjects in the statements?

The language model's response to this question generates a candidate question, " What topic is Klaus Mueller passionate about? (What topic is Klaus Mueller passionate about?).

Using the generated questions as search queries, the system collects relevant Memories for each question and prompts the user to cite a specific Memory as the basis for answering the question. The full text of these prompts is as follows

This process generates sentences such as "Klaus Mueller is dedicated to his research on gentrification. These sentences are then stored in the Memory Stream as a Reflection containing a pointer to the quoted memory object.

From these sequences, the agent generates a tree structure (reflection tree) consisting of the agent's influence (Observation) on the external environment represented by the leaf nodes and the resulting abstract thoughts (Reflection ), as shown in the figure below.

The figure shows Klaus Mueller's reflection tree, in which Observation and Reflection are repeatedly synthesized to finally generate self-awareness "Klaus Mueller is highly dedicated to research. The figure is a reflection tree of Klaus Mueller.


For agents to behave coherently, it is necessary to plan a day over a longer time horizon that includes past events. Planning is used to describe the agent's future behavior and to make the agent's behavior consistent over time.

Planning includes three elements: location, starting time, and duration, and is stored in the Memory Stream in the same way as Reflection.

To create a Planning, we use a top-down approach, first creating a plan that gives a general idea of the day's schedule, and then recursively generating details of the plan by the language model. To create the first plan, an agent's general description (name, characteristics, summary of recent events, etc.) and a summary of the previous day are entered into the language model. The prompt looks like the following, with the second half of the sentence unfinished as the language model generates the details.

These prompts result in a rough sketch of the agent's daily plan divided into five to eight pieces, and using the above prompt for agent Eddy Lin as an example, the daily plan is divided as follows.

Wake up at 8:00 a.m. and do morning routine → Go to Oak Hill College for classes at 10:00 a.m. → . .. → Work on new music composition from 1:00 p.m. to 5:00 p.m. → Eat dinner at 5:30 p.m. → Finish school assignments and go to bed by 11:00 p.m.

The agent then saves this plan in the mainstream and breaks it down further to create more detailed actions. For example, the above plan to work on a new composition from 1:00 to 5:00 p.m. could be broken down as follows

1pm: Start by brainstorming some ideas for compositions → 4pm: Take a break and refresh before reviewing and revising compositions

Then break this down further into actions every 5-15 minutes as shown below.

4:00 p.m.: Eat a light snack of fruit, granola bars, nuts, etc. → 4:05 p.m.: Walk around the workspace a bit → ・・ → 4:50 p.m.: Clean up the workspace

Agents can change these plans midstream depending on environmental influences and interactions between agents, and these plans are dynamically updated at each time step.

End-to-end Evaluation

In this paper, we have conducted a validation on 25 agents living in Smallville to demonstrate that the aforementioned architecture allows for a more reliable simulation of collective behavior.


Information diffusion is a commonly studied phenomenon in the social and behavioral sciences, and it is expected that such information diffusion will occur among agents when there is important information.

To test whether this occurs, we measured the spread of two specific pieces of information over two days for 25 agents in Smallville

  1. Sam's candidacy for village mayor
  2. Isabella's Valentine's Day party at Hobbs Cafe

At the start of the verification, only the respective senders (Sam, the candidate for village head, and Isabella, the party organizer) have both pieces of information.

To confirm the occurrence of information diffusion, after 2 days of validation, each of the 25 agents was

  • Do you know who is running for mayor?"
  • Did you know there is a Valentine's Day party?

and labeled the agents' responses as YES if they knew and NO if they did not.

In addition to this, the experiment also measured changes in the density of friendships (network density) formed through conversations between agents during the simulation.

Before and after the simulation, each agent is asked, "Do you know of <name>?" and if two agents know each other, a friendship is considered to have been formed.

The answers were then used to form an undirected graph with the 25 vertices representing the agents as vertices(V) and the line connecting the vertices as edges(E ). The network density was calculated as follows.


After two days of simulation, the number of agents who knew about Sam's candidacy for village mayor increased from one (4%) to eight (32% ), and the number of agents who knew about the party held by Isabella increased from one (4%) to twelve (48% ).

The figure below shows how information about the parties held by Isabella spread.

As can be seen from the figure, we were able to confirm that information was spread through interactions between agents, such as Isabella → Sam → Jennifer.

The network density before and after the simulation increased significantly from 0.167 to 0.74, indicating that agents formed new friendships with each other during the simulation.


How was it? In this article, I explained a paper that successfully simulated collective behavior by constructing an artificial village society from 25 agents generated by an architecture consisting of the three elements of Memory Stream, Reflection, and Planning. The paper successfully simulated collective behavior by constructing an artificial village society from 25 agents generated by this architecture.

In the text, it is stated that the experiments conducted in this study were generally inadequate.

  • The experiments on the agents generated in this study were limited to a short period (2 days), and future research should examine the capabilities and limitations of the agents more comprehensively
  • Language models have been shown to contain biases, and agents may be behaving in ways that reflect these biases.

and other issues that will require further research.

However, the agent behaviors demonstrated in this study can be used in a variety of fields, including social robots and social computing systems, and we are very much looking forward to future progress. The details of the architecture and verification results of the agent introduced in this study can be found in this paper, and those interested should refer to it.

If you have any suggestions for improvement of the content of the article,
please contact the AI-SCHOLAR editorial team through the contact form.

Contact Us