Catch up on the latest AI articles

Automation Of Scientific Experiments With Multi Large-scale Language Models From Autonomous Design To Execution

Automation Of Scientific Experiments With Multi Large-scale Language Models From Autonomous Design To Execution

Large Language Models

3 main points
✔️ Developed an AI agent system that autonomously designs and executes scientific experiments and generates high quality code
✔️ Emphasized ethical and responsible use of this powerful tool and noted the need to mitigate risks associated with misuse
✔️ Clarified hardware, API documentation andAlso mentions technical issues that limit

Emergent autonomous scientific research capabilities of large language models
written by Daniil A. Boiko, Robert MacKnight, Gabe Gomes
(Submitted on 11 Apr 2023)
Comments: Published on arxiv.
Subjects: Chemical Physics (physics.chem-ph); Computation and Language (cs.CL)


The images used in this article are from the paper, the introductory slides, or were created based on them.


Recent years have seen tremendous progress in large-scale language modeling, particularly in transformer-based models. They have been used successfully in fields ranging from natural language processing to biology, chemistry research, and even code generation, and the massive scaling of models done by OpenAI is a particularly significant advance in this area. In addition, techniques such as Reinforcement Learning Learning from Human Feedback (RLHF) are helping to improve the quality of generated text, address more diverse tasks, and enhance the ability to theorize decisions.

On March 14, 2023, OpenAI released its most powerful LLM to date, GPT-4. While many of the details of its training methods and the data used are undisclosed, GPT-4 has proven its exceptional problem-solving abilities, with high performance on SAT and BAR exams, LeetCode problem solving, and even contextual explanations from images (including niche jokes). It also reports real-world examples of how it can handle chemistry problems.

Based on these results, this paper develops a multi large-scale language model-based intelligent agent (hereafter referred to as "agent") that can autonomously design, plan, and execute complex scientific experiments. The agent cansearch the Internet, browse relevant documents, use robotic experiment APIs, and evenutilizeotherlarge-scale languagemodels to perform a variety of tasks. The paper demonstrates the versatility and efficiency of the agent through efficient search and navigation of a wide range of hardware documents, precise control of liquid processing equipment, and tackling complex problems that require the simultaneous use of multiple hardware modules and integration of diverse data sources.

Agent Overview

Through its innovative architecture and multiple modules, the agent developed in this thesis enables the autonomous design, planning, and execution of scientific experiments. The system consists of four main components, with a central "planner" at its core.

The planner takes action based on the prompt entered (e.g., "run multiple Suzuki reactions") and executes a series of actions in response to this instruction. These actions include searching for information on the Internet, performing calculations in Python, accessing relevant documents, and finally running the experiment. They are performed in a wide variety of environments, including the use of cloud labs, manipulation of liquid handling equipment, or manual experimental instructions.

Agents are designed to gather the information needed for a task, perform calculations, and execute the appropriate response. It recognizes that an average of 10 steps are required to fully understand the requested task, and if the information provided is sufficiently detailed, no additional questions need to be asked.

The web search component receives queries from the planner, translates them into appropriate web search queries, and performs searches via the Google Search API. The resulting web pages are scrutinized and useful information is extracted and provided to the planner. At this stage, GPT-3.5 may be utilized due to its balance of speed and accuracy.

The Document Search component searches hardware-related documents to extract the most relevant information. The process focuses on providing syntactic information on specific functional parameters and APIs that are essential for experimentation.

The Code Execution component securely executes code in an isolated Docker container and protects end-host machines from unexpected behavior. The Automation component also executes the generated code on the actual hardware or provides procedures for manual experimentation.


In this paper, ibuprofen synthesis is used as an example of the agent's performance. Starting with the simple prompt "Synthesize ibuprofen," the agent searches the Internet for the necessary information and gathers details from a specific website about how to synthesize ibuprofen. In the process, the agent correctly identifies the Friedel-Crafts reaction, in which isobutylbenzene and acetic anhydride are catalyzed by aluminum chloride, as the first step in the synthesis.

In addition to the ibuprofen example, agents can also efficiently plan and execute aspirin and aspartame synthesis plans. Even if initial synthesis results are problematic, they can be corrected by providing appropriate synthetic examples. Furthermore, in the Suzuki reaction, the agent accurately identifies substrates and products.

However, instability can be observed in text generation with high temperature parameters when proposing specific catalysts or groups. To solve this, connecting the agent to chemical reaction databases such as Reaxys and SciFinder via APIs has dramatically improved the system's performance and accuracy. Analysis of previous statements made by the system is also an important method to improve accuracy.

In addition, today's technological environment increasingly requires the combination of intelligent agents and software with advanced reasoning capabilities. Key to this challenge is the clear and concise presentation of complex hardware API documentation. Comprehensive software documentation is essential for understanding and effectively utilizing the complex interactions between the diverse components that characterize modern software. However, these documents are often written in highly technical terms that can be difficult for non-technical users to understand. This creates a barrier to entry for new users and limits software adoption and effectiveness.

Here, enlisting the help of large-scale language models is considered as a solution. If software documentation can be generated in natural language in a form that is accessible to non-experts, this barrier could be overcome. For example, models trained on a corpus of text containing extensive information about application programming interfaces (APIs), such as the Opentrons Python API, have the potential to improve the accuracy of agents when using APIs.

For this purpose, it generates OpenAI ada embeddings across the entire OT-2 API document and calculates cross-references and similarities to the query. Agents are instructed to query using "Documentation actions" as needed for the appropriate use of the API. Based on the query, ada embeddings are generated and appropriate document sections are selected through distance-based vector searches. This process plays an important role in providing the agent with information about the heater shaker hardware modules needed to perform the chemical reaction.

Application of this approach to a diverse robotic platform such as the Emerald Cloud Lab (ECL) presents new challenges. However, this paper does explore the effectiveness of providing information about the symbolic lab language (SLL) of ECL, which is uncharted territory for the GPT-4 model. To this end, we provide agents with a guide to the overall functionality provided by ECL to perform experiments.

As an example to prove this, the three cases above illustrate the agent's response to a user-submitted query.In each case, the agent correctly identifies the function needed to perform the task. Once a function is chosen, its raw text document is processed in a separate GPT-4 model to preserve and summarize the syntax of the code. This model is particularly adept at efficiently retaining information about the various options, instruments, and parameters for a given function. Once the document is fully processed, the model is asked to generate a block of code using the given function and return it to the planner. This process provides the basis for the agent to utilize specific options, instruments, and parameters to use the function. The goal is to reduce technical barriers and make it easier for users to design and run sophisticated experiments.

In addition, advances in automation technology have made it possible to develop multi-instrument systems that can control multiple devices by issuing commands in natural language. Providing appropriate information to agents is crucial to conducting experiments in the physical world. To serve this purpose, this paper selects an open-source liquid handler with an extensive Python API and provides its "Getting Started" page to planners of the system. We also vectorized additional page information using the method described in the "Providing Hardware API Documentation" section, but this process does not involve access to the Internet.

This experiment began with a basic attempt at robot manipulation. In particular, the ability to treat the entire microplate as a single unit was required. Simple instructions in natural language, such as "paint every other row with the selected color," resulted in precise protocols in most cases. These protocols, when executed by the robot, closely followed the requested instructions.

The agent's initial task was to prepare a small sample of the original solution. Next, they requested that a UV-Vis measurement be performed, and upon completion of the measurement, received the filename of a NumPy array containing the spectral data for each well of the microplate. The agent utilized this data to create a Python code that identified the maximum absorption wavelength and solved the problem correctly. This sequence of processes demonstrates new possibilities for using natural language to achieve precise experimental manipulation.

In the experiments conducted so far, the agent's existing knowledge of the module could have been affected. Therefore, we are testing the agent's ability to plan experiments by performing the necessary calculations based on data obtained from the Internet and finalizing the code for the liquid handler. To make the task more complex, we are asking the agents to use the Heater Shaker module, which was released after the GPT-4 training data collection was completed. These requirements have been incorporated into the agent's configuration.

In the designed problem, the agent is given a liquid handler equipped with two microplates. The source plate contains phenylacetylene, phenylboronic acid, several aryl halide bonding partners, two catalysts and bases, and a solvent to dissolve the sample. The target plate is installed in the heater-shaker module. The agent's goal is to design a protocol to perform the Suzuki and Sonogashira reactions.

The agent begins searching the Internet for information about the requested reaction and its conditions. It selects a suitable binding partner for the corresponding reaction. Bromobenzene was chosen for the Suzuki reaction and benzene iodide for the Sonogashira reaction, but this changes from run to run. This suggests a future use case where the model will run the experiment multiple times, analyze its inferences, and build a larger overall picture.

The model chooses Pd/NHC catalyst as a more efficient and modern approach in cross-coupling reactions and triethylamine as the base.The agent then calculates the required amounts of all reactants and describes the protocol. However, due to the use of an incorrect heater shaker module name, the model consults the documentation and successfully corrects the protocol based on that information. The resulting GC-MS analysis confirms that the target products of both reactions were produced.


This paper presents an intelligent agent system that can autonomously design, plan, and execute complex scientific experiments. The system exhibits superior reasoning and experiment design capabilities, can effectively address complex problems, and can generate high-quality code. However, the development of new machine learning systems and automated methods for scientific experiments raises safety and dual-use concerns, such as illegal activities and increased security threats. Ensuring the ethical and responsible use of these powerful tools will mitigate the risks associated with their misuse while continuing to explore the potential of large-scale language models in advancing scientific research.

  • メルマガ登録(ver
  • ライター
  • エンジニア_大募集!!
Takumu avatar
I have worked as a Project Manager/Product Manager and Researcher at internet advertising companies (DSP, DMP, etc.) and machine learning startups. Currently, I am a Product Manager for new business at an IT company. I also plan services utilizing data and machine learning, and conduct seminars related to machine learning and mathematics.

If you have any suggestions for improvement of the content of the article,
please contact the AI-SCHOLAR editorial team through the contact form.

Contact Us