Catch up on the latest AI articles

Applications And Prospects For Large-Scale Language Models In Chemistry And Materials Science As Seen In The Success Of The Hackathon

Applications And Prospects For Large-Scale Language Models In Chemistry And Materials Science As Seen In The Success Of The Hackathon

Large Language Models

3 main points
✔️ Demonstration of the potential of large-scale language models: a hackathon enabled a complex prototype in a few hours, showing the potential of large-scale language models for research in the field of chemistry and materials science.
✔️ New modeling methods in the field of chemistry and materials science: Provides new methods for handling unstructured data by incorporating contextual information in the field of chemistry and materials science using large-scale language models.
✔️ New Challenges and the Need for Diverse Expert Collaboration: Addresses issues of transparency and access to large-scale language models and requires collaboration among diverse experts for safe use and next-generation education.

14 Examples of How LLMs Can Transform Materials Science and Chemistry: A Reflection on a Large Language Model Hackathon
written by Kevin Maik JablonkaQianxiang AiAlexander Al-FeghaliShruti BadhwarJoshua D. BocarslyAndres M BranStefan BringuierL. Catherine BrinsonKamal ChoudharyDefne CirciSam CoxWibe A. de JongMatthew L. EvansNicolas GastelluJerome GenzlingMaría Victoria GilAnkur K. GuptaZhi HongAlishba ImranSabine KruschwitzAnne LabarreJakub LálaTao LiuSteven MaSauradeep MajumdarGarrett W. MerzNicolas MoitessierElias MoubarakBeatriz MouriñoBrenden PelkieMichael PielerMayk Caldas RamosBojana RankovićSamuel G. RodriquesJacob N. SandersPhilippe SchwallerMarcus SchwartingJiale ShiBerend SmitBen E. SmithJoren Van HerckChristoph VölkerLogan WardSean WarrenBenjamin WeiserSylvester ZhangXiaoqi ZhangGhezal Ahmad ZiaAristana ScourtasKJ SchmidtIan FosterAndrew D. WhiteBen Blaiszik
(Submitted on 9 Jun 2023 (v1), last revised 14 Jul 2023 (this version, v4))
Subjects: Materials Science (cond-mat.mtrl-sci); Machine Learning (cs.LG); Chemical Physics (physics.chem-ph)


The images used in this article are from the paper, the introductory slides, or were created based on them.


In recent years, remarkable progress has been made in the integration of machine learning with chemistry and materials science. From accelerating simulations to predicting specific compounds and properties, machine learning is opening up new possibilities for scientific inquiry. Despite this progress, however, the development of models specific to individual problems remains the norm, and the diversity and complexity of scientific inquiry makes the development of general tools difficult.

Particularly in the fields of chemistry and materials science, there is a remarkable diversity and contextual dependence in the format of data and in the description of experimental methods. Because of this, integrating and transforming data into machine learning models is challenging, and scientists struggle to create "glue codes" to connect different tools and achieve complex workflows.

The potential of large-scale language models to address this challenge is attracting attention. In particular, large-scale language models such as GPT-4 offer new ways to tackle problems that are difficult to solve using traditional approaches. These models have been shown to be particularly effective in extracting knowledge from unstructured text and in creating interfaces between tools via natural language.

This paper reports the results of a hackathon event exploring the potential of this technology for large-scale language modeling in chemistry and materials science. During this hackathon, a wide variety of projects were proposed and actually prototyped, including predictive modeling, automation, development of new interfaces, knowledge extraction, and education, as shown in the table below.

These hackathon efforts suggest that large-scale language models have the potential to revolutionize scientific research and could become a foundational tool in future research. This article presents a sampling of some of the results of the hackathon.

Prototype Introduction

The first is a genetic algorithm that utilizes large-scale language models. Genetic algorithms take an evolutionary approach in which building blocks are crossed and mutated to produce better structures. The outcome of this approach is highly dependent on its compatibility with the underlying chemistry, and the McGill University team suggests that incorporating large-scale language models into genetic algorithms has the potential to significantly improve the efficiency of this process.

First, the large-scale language model demonstrated the ability to understand and efficiently reconstruct SMILES strings representing chemical structures. In initial experiments, GPT-3.5 properly decomposed the molecule with a 70% success rate (Fragment). Furthermore, when recombining two molecules, the large-scale language model was found to often produce more chemically rational structures than random methods. Evaluation by organic chemists confirmed that all molecules generated by the large-scale language model were chemically rational (Reproduce), and the McGill University team asked the large-scale language model to propose new molecules based on specific performance indicators. Results from this initial phase show that the large-scale language model can propose chemically rational improvements (Optimize).

This prototype demonstrates that large-scale language models can be powerful tools for generating chemical structures and improving existing structures. However, the technology is still in its early stages and challenges remain, especially in the accurate generation of SMILES. In the future, the development of fundamental models of chemical specialties could overcome these challenges, and the results of the McGill University research team show that genetic algorithms using large-scale language models have the potential to revolutionize the design and improvement of chemicals . This approach is more efficient than traditional methods and may pave the way for the discovery of new chemical structures.

The second is MAPI-LLM. The accuracy of electronic structure calculations has reached such a high level that we can now answer questions like, "Is the material AnByCz stable?" is now possible to answer questions such as "Is the material AnByCz stable? In fact, the Materials Project stores thermodynamic data for many components, allowing us to obtain a reasonable estimate of the stability of a given material. If a material is not in the database, a simulation can be performed instead. Similarly, for questions such as "Tell me the reaction that produces CaCO3," there is plenty of useful information in the Materials Project database and on the Internet to help you find the answer.

State-of-the-art computational tools and existing databases can be used to answer these questions. However, their use requires expertise. To use an existing database, you need to choose which database to use, how to query the database, and which representation of the compound to use (e.g., International Chemical Identifier (InChI), SMILES, etc.). If the data is not in the database, calculations must be performed, which requires a deep understanding of the technical details. Large-scale language models can simplify the use of such tools. By entering a question, you can prompt the large-scale language model to translate that question into a workflow that leads to an answer.

The MAPI-LLM team took the first steps toward developing such a system (MAPI-LLM), asking "Is the material AnByCz stable?" and created a procedure to convert text prompts into Materials Project API (API) queries to answer questions such as "Is this material stable? In addition, MAPI-LLM has created a procedure to translate a text prompt into a Materials Project API (API) query to answer questions such as "Is Fe2O3 magnetic?" and regression questions such as "What is the band gap of Mg(Fe2O3)2?" and regression problems such as "What is the band gap of Mg(Fe2O3)2?

Because it uses a large language model to create workflows, MAPI-LLM can handle even more complex questions. For example, the question, "If Mn23FeO32 is not a metal, what is its band gap?" creates a two-step workflow that first checks if the material is a metal, and if not, calculates its band gap. In addition, MAPI-LLM applies ICL when material property data is not available via MAPI; MAPI-LLM generates an ICL prompt and builds a context based on similar material data available in the Materials Project database. This context is then leveraged by the large-scale language model to infer properties of the unknown material. This innovative use of ICL fills a data gap and increases the robustness and versatility of MAPI-LLM.

The third is sMolTalk. In general, software for chemistry has the problem that it takes a considerable amount of time to learn how to operate it. An example of this is visualization software. Chemists and materials scientists can spend hours or even days learning the details of a particular visualization software. sMolTalk's development team is able to address this inefficiency by using a large language model to write code for visualization tools such as 3dmol.js. The figure below shows the interface. With only a few examples of user input and a few shots of prompting showing the expected JavaScript code to manipulate the 3dmol.js viewer, retrieve protein structures from the Protein Data Bank (PDB), color-code parts of the structure in a particular way, etc., visualization prototype an interface to create a solution.

In this example, the user enters a sequence of four commands. The large language model (1) generates code to retrieve the structure, (2) colors carbon blue, (3) displays hydrogen as a red sphere, and (4) reduces the size of the sphere.

The beauty of a language model is that users can write prompts in a variety of ("fuzzy") ways. You can write terms like "color" or "color," or "light yellow" or "pale yellow," and the large-scale language model will translate them into something that the visualization software can interpret.

However, this application also highlights the need for further development of these large-scale language model-based tools. For example, one of the challenges facing the sMolTalk tool is robustness. Specifically, this is the problem of unexpected fragments or parts from user-entered prompts being included in the generated output. This is caused by the model misinterpreting parts of the prompt, resulting in the inclusion of irrelevant information in the output. To address this problem, more complex methods must be used. For example, "retry" is a method whereby if the model makes an error, the error message is read to understand what went wrong, correct the problem, and retry. This requires granting the model access to the error message so that it can understand its own error message and decide what to do next based on it. further improvements can be made by leveraging the knowledge base, such as the 3dmol.js documentation.

The fourth is the I-Digest educational tool. Large-scale language models can also offer new educational opportunities. the I-Digest team proposes a tool for new educational opportunities by providing digital tutoring based on course material, such as lecture recordings. using the Whisper model, video recordings of lectures can be transcribed into text transcripts. These transcripts can then be fed into a large-scale language model along with prompts that ask students to come up with questions about the content presented in the video. In the future, these questions could be presented to students before the video begins, allowing them to skip over parts they already know, or after the video, recommending relevant time stamps and additional material to students in case of incorrect answers.

Importantly, in contrast to traditional teaching materials, this approach could generate a virtually unlimited number of questions and could be continually improved by student feedback in the future. Furthermore, one could easily imagine extending this approach to take into account lecture notes and books to further guide students or recommend specific exercises.


The fact that the teams participating in the hackathon were able to present prototypes that accomplish complex tasks in a short period of time is indicative of the potential of large-scale language models. These prototypes can be realized in just a few hours, but would traditionally require months of programming work. By experimenting in a low-risk environment, the teams participating in the hackathon have achieved unprecedented motivation and results.

The use of large-scale language models enables modeling in new fields, including chemistry and materials science. This includes incorporating contextual information and working directly with unstructured data. Tools such as Copilot and ChatGPT are emerging as a means of eliminating programming and tool development uncertainties. These advances are opening up a future where end users can easily create and customize applications.

Also interestingly, the logic of many of the tools is written in English rather than a programming language. This makes the generated code shorter, easier to understand, and less dependent than before. While this demonstrates the effectiveness of describing technical solutions in natural language, we must also recognize the limitations of interpretability and lack of robustness of large-scale language models.

Also, because it uses OpenAI's API, it is opaque about how it builds its models and guarantees reliable access; while OpenAI's API is easy to use, the performance of such publicly available large-scale language models can be unstable, especially in new types of applications This is why the OpenAI API is not available for molecular and materials science applications. Therefore, the use of molecular and materials science requires the development of new benchmarks specific to molecular and materials science. This requires a framework to evaluate their ability to handle context and unstructured data.

In addition, exploring the potential of this large-scale language model will require the collaboration of a diverse set of experts, including chemists and computer scientists as well as legal experts. Safe use of these tools, evaluation criteria, robust deployment, and education to ensure that the next generation of scientists can effectively utilize them are also important issues.

If you have any suggestions for improvement of the content of the article,
please contact the AI-SCHOLAR editorial team through the contact form.

Contact Us