BioinspiredLLM" Innovations In Biological Materials Research Using Large-scale Language Models

Large Language Models 24/05/2024

3 main points
✔️ A revolution in bio-inspired materials design:BioinspiredLLM, a new large-scale language model to accelerate research in biological materials design.
✔️ BioinspiredLLM's text mining and data cleaning capabilities: BioinspiredLLM uses Q-A processing distillation techniques to mine and clean text and data.Itgenerates realistic dialogue from formal text and significantly reduces text fragments.
✔️ Collaborating with Raw Molding AI for Efficient Materials Design and Development: bioinspiredLLM can collaborate with other generative AI models. This new generative AI cooperative agent framework significantly reduces the time and resources required to design and develop bioinspired materials.

BioinspiredLLM: Conversational Large Language Model for the Mechanics of Biological and Bio-inspired Materials
written by Rachel K. Luu, Markus J. Buehler
(Submitted on Submitted on 15 Sep 2023 (v1), last revised 11 Dec 2023 (this version, v2))
Comments: Published on arxiv.
Subjects: Materials Science (cond-mat.mtrl-sci); Disordered Systems and Neural Networks (cond-mat.dis-nn); Soft Condensed Matter (cond-mat.soft); Machine Learning (cs.LG); Adaptation and Self-Organizing Systems (nlin.AO)

code：

The images used in this article are from the paper, the introductory slides, or were created based on them.

Summary

The combined fields of materials science, biology, and engineering have long held great promise. Materials inspired by biological structures promise to revolutionize the design of new sustainable, high-performance materials due to their hierarchical structure-property relationships. From armadillo shells to bamboo to coconut husks, materials in nature have unlimited potential that humans have yet to fully exploit. However, the transition of this knowledge from biological research to actual engineering applications has only just begun.

Today, the importance of learning from nature is being reaffirmed as a way to address environmental issues. In this context, advances in materials informatics offer new opportunities to accelerate the development of bio-inspired solutions. In particular, artificial intelligence techniques such as large-scale language models have the potential to significantly advance research in this field. These models are learning from extensive textual data and fine-tuning their knowledge in specific scientific domains to outperform the original models.

Furthermore, understanding complex structures in nature and applying them to materials design requires translating and connecting different knowledge domains. Knowledge of biological materials is generally more extensive than that of chemical compounds or protein sequences. However, materials in nature do not have a standardized method to centrally describe their structures and properties. In this regard, autoregressive large-scale linguistic models have the potential to synthesize a vast body of literature and provide a new approach to aid the materials discovery and design process.

The specialization of large-scale language models that take advantage of biological materials is a promising step toward accelerating the research and discovery of bio-inspired materials. Learning from nature and applying that knowledge is expected to pave the way for a sustainable future.

BioinspiredLLM Overview

The base model selected for this paper is Llama-2-13b-chat, an open-source conversational large-scale language model. Fine tuning of this model has resulted in the development of Orca-2-13b, which has enhanced inference capabilities; Orca-2-13b uses a corpus containing over 1000 articles dedicated to structural biological material. The figure below shows the publishers and publication years used in the corpus.

Publishers such as Elsevier, Wiley, Springer Nature, and the American Chemical Society are found to contain many articles.

And two methods are employed to train the model: the first uses the text as-is and generates data with standard token lengths; the second, called "Q-A processing," uses the original,un-fine-tunedLlama-2-13b- chat model to further refine and clean the textual content and extract key insights in question-answer pairs.

We performed fine tuning of the model using two different methods, but found that the model using traditional non-Q-A processing had a lot of undesirable information residuals. Therefore, the Q-A processing method is applied inthis paper.

In this paper, the Q-A process was applied to both the Llama-2-13b-chat and Orca-2-13b models described above, and the Orca-2-13b model in particular showed a significant performance improvement, so this model is designated "BioinspiredLLM". For comparison, the fine-tuned Llama-2-13b-chat model is also designated as "Llama-BioLLM.

The figure below provides an overall overview of the BioinspiredLLM architecture. It is based on the autoregressive transformer model and illustrates the process from system prompts to user queries and the generation of responses based on them.

In addition, to evaluate the performance of BioinspiredLLM, the paper prepares a set of carefully selected queries based on the framework proposed by Brodnik et al. These queries cover three main tasks that the model may face.

The first is the "Knowledge Recall" task. The second is the "Hypothesis Generation" task. The second is the Hypothesis Generation task, which evaluates how well the model generates new insights and ideas about biomaterials research, including experimental design, subject selection, areas of application, etc. The third istheGenerative AI Collaborationtask, which tests the ability of the model to generate new hypotheses for various individual tasks that researchers face. This examines how effectively the model can assist researchers with a variety of specific challenges they face, such as prompt engineering and clustering.

Through these evaluations, the potential and applicability of BioinspiredLLM in the complex area of scientific research is revealed. This article specificallyaddresses theHypothesis Generationtask and theAssistant(Generative AI Collaboration)task.

Hypothesis generation

BioinspiredLLM also serves as an engine of scientific creativity. The model can answer questions about subjects it has not seen before and can guide researchers by proposing new predictions and hypotheses. For example, users can ask the model about experimenting with subjects that have not been explicitly studied in the literature.

In this example, the user asks about researching the gum nut of eucalyptus. This is a small woody organ that grows on eucalyptus trees and has not been explicitly studied in the literature before, according to the search. Here, BioinspiredLLM integrates general knowledge from prior learning with knowledge about the biological material and its characterization from the fine-tuning dataset. bioinspiredLLM relies on prior learning to recognize the eucalyptus gum nut, and then uses the fine-tuning experimental suggestions based on articles on the characterization of the plant materials included. This includes experiments on water absorption and biodegradation, which factors usually have a significant impact on the dynamics of biomaterials.

As another experimental use case,BioinspiredLLM requires a hypothesis about the mechanical properties of jackfruit, as shown inthe figure below.

Jackfruit was recently studied by Lazarus et al. but this work was published shortly after the data set collection period and is therefore not included in the current data set. In other words, BioinspiredLLM has not "seen" this study.

A comparison is shown between BioinspiredLLM's response (Figure a) and a figure drawn directly from Lazarus et al. (Figure b). When asked to hypothesize about the structure of jackfruit spines, BioinspiredLLM predicts that the spines form a network that aids in energy absorption and shock loading, a theory strongly supported by the findings of Lazarus et al.

In addition, BioinspiredLLM also points out an important reservation. That is that spines can also help control crack propagation. This is exactly what was shown during the Lazarus et al. compact tension experiments, where cracks are seen propagating around spines along a foamy matrix in a controlled manner.BioinspiredLLM has never explicitly studied this data, yet these findings are predicted.

Assistant

BioinspiredLLM can assist with individual research tasks. The following excerpt requests BioinspiredLLM to assemble a dataset.

BioinspiredLLM provides clear and organized charts of species and their structural and mechanical properties upon user request. This and extended reactions are used to rapidly generate a complete dataset of biological materials. In addition, the method limits the subjective biases that may arise when humans select and group species and properties.

In more unconventional and engaging scenarios, BioinspiredLLM can assist with prompt engineering and collaborate with other generative AI models. Through collaboration with Stable Diffusion 2.0, a text-to-image model, BioinspiredLLM also enables user idea generation. The figure below shows a potential workflow that illustrates the collaboration between BioinspiredLLM, Stable Diffusion 2.0, and the user. The conversation between the user and BioinspiredLLM discusses appropriate prompts for generating a bio-inspired 2D image using a text-to-image model; BioinspiredLLM suggests multiple prompts and highlights them; BioinspiredLLM suggests a few prompts and highlights them; and BioinspiredLLM suggests a few prompts and highlights them.

When a user requests detailed and specific prompts from BioinspiredLLM for a design inspired by elements in nature, such as algae, feathers, spider webs, coral, etc., the prompts output from BioinspiredLLM are then input into the image synthesis AI tool Stable Diffusion 2.0, and the image is input into Stable Diffusion 2.0, which generates the image as shown below. It can help users brainstorm ideas.

The selected images can further be converted into 3D models using thermal mapping techniques to further extend their application. The generated 3D models can be used as the basis for future simulations and experiments, greatly accelerating the process of design and development of bio-inspired materials.

In another example, the user can ask BioinspiredLLM for ideas for combining two biomaterial structures, as shown in the figure below.

BioinspiredLLM offers creative suggestions such as combining plant cell walls with animal hooves, sponge spicules with bone, and lotus leaves with butterfly wings.Not only does BioinspiredLLM offer attractive combinations with biological species, each response outlines the logic of material selection in the context of material properties outlines the logic of the design and also proposes hypotheses about the behavior of the new design.

These design ideas are also input into Stable Diffusion 2.0 to generate 2D images, one of which is selected for conversion into a 3D model. It is clear that these generative AI frameworks can dramatically accelerate the creation of bio-inspired designs and prototypes; by leveraging the generative "creative" capabilities of BioinspiredLLM, researchers can be guided by unique ideas supported by mechanical insights BioinspiredLLM is a powerful tool for the development of new ideas. With the assistance of generative AI technology, the timeline for the design and development of bioinspired materials can be significantly accelerated.

Summary

In this paper, we proposeBioinspiredLLM, a conversational large-scale language model with expertise in structural biological materials. This model, which leverages deep learning techniques and is specialized for biological materials, outperforms the base model by far.

Notably, the text and data mining and cleaning performed through the Q-A processing distillation technique has been successful in creating realistic dialogue from formal writing and significantly reducing text fragments.

In addition, BioinspiredLLM is able to provide accurate and concise information about biomaterials, especially through the use of a Retrieval Assistance Strategy (RAG). Furthermore, the model has the ability to integrate knowledge from prior learning and fine tuning to provide new insights and creative ideas about biological materials that have not yet been revealed.This is especially true for materials that have never been explicitly studied before.BioinspiredLLM can be a powerful aid to researchers in tasks such as dataset generation, grouping, and clustering.

One of the most interesting aspects is the remarkable potential that BioinspiredLLM shows in working with other generative AI models. Such a novel generative AI cooperative agent framework could dramatically reduce the time and resources required to design and develop bioinspired materials, and the BioinspiredLLM effort is expected to bring a new dimension to the study of biomaterials and the materials science inspired by them, opening up It is expected to open up new horizons.

Categories related to this article

Takumu: I have worked as a Project Manager/Product Manager and Researcher at internet advertising companies (DSP, DMP, etc.) and machine learning startups. Currently, I am a Product Manager for new business at an IT company. I also plan services utilizing data and machine learning, and conduct seminars related to machine learning and mathematics.

BioinspiredLLM" Innovations In Biological Materials Research Using Large-scale Language Models

Summary

BioinspiredLLM Overview

Hypothesis generation

Assistant

Summary

Libra] A New Multimodal Design Of Large Language Models Using Separate Vision Systems

Libra] A New Multimodal Design Of Large Language Models Using Separate Vision Systems

Construction And Analysis Of The "TruthEval" Dataset To Expose LLM Weaknesses

Construction And Analysis Of The "TruthEval" Dataset To Expose LLM Weaknesses

SportQA, A New Dataset That Measures The Comprehension Of Sports In Large Language Models

SportQA, A New Dataset That Measures The Comprehension Of Sports In Large Language Models

Proposal For A New Evaluation Method For AI Assistants Based On Human Preferences

Proposal For A New Evaluation Method For AI Assistants Based On Human Preferences

The Future Of Music Education, Flute X GPT And LAUI's Potential To Change Large-Scale Language Models

The Future Of Music Education, Flute X GPT And LAUI's Potential To Change Large-Scale Language Model ...

Prediction Of Handball Results For The 2024 Paris Olympics And Explanation Of The Basis For The Prediction Using LLM

Prediction Of Handball Results For The 2024 Paris Olympics And Explanation Of The Basis For The Pred ...