GPT-4V's Usefulness For Reading Scientific Papers Without Programming

Large Language Models 05/06/2024

3 main points
✔️ Streamline the analysis of data and information in chemistry
✔️Strengthens role in closing the gap between chemical research and advanced computational tools
✔️ Dramatically improves extraction and analysis of critical information, especially in areas such as reticular chemistry

Image and Data Mining in Reticular Chemistry Using GPT-4V
written by Zhiling Zheng, Zhiguo He, Omar Khattab, Nakul Rampal, Matei A. Zaharia, Christian Borgs, Jennifer T. Chayes, Omar M. Yaghi
(Submitted on 9 Dec 2023)
Comments: Published on arxiv.
Subjects: Artificial Intelligence (cs.AI); Materials Science (cond-mat.mtrl-sci); Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR)

code：

The images used in this article are from the paper, the introductory slides, or were created based on them.

Summary

The use of artificial intelligence (AI) in chemistry is evolving rapidly as its potential expands. In particular, the emergence of large-scale language models has greatly expanded the role of AI in chemical research. These models have attracted significant attention because of their superior ability to support a wide variety of tasks in chemical research and their ability to be easily "programmed" or "taught" using natural language. Now evolving from text-only to multimodal, processing diverse information, large-scale languagemodels have established themselves as powerful and useful AI assistants for a wide variety of applications.

At the forefront of this evolution is the GPT-4V. The "V" stands for its visual capabilities, and this model's ability to understand visual as well as textual information goes far beyond traditional models in its ability to find and analyze valuable data from graphical representations of scientific literature. this ability of GPT-4V means that even researchers without specialized programming knowledge or computer vision skills, but can also be leveraged by researchers with customized instructions.

This paper shows how GPT-4V is being applied to the study of reticular chemistry. GPT-4V's ability to integrate and interpret textual and graphical data from scientific papers dramatically improves the extraction and analysis of critical information, particularly the importance of reading physical property results from graphical content. This approach is not limited to reticular chemistry, suggesting that automated literature analysis can be extended to other scientific disciplines.

The introduction of GPT-4V demonstrates that AI will further enhance its role in fostering scientific innovation and discovery and closing the gap between advanced computational tools and cutting-edge chemical research.

Initial evaluation of GPT-4V performance

Here we evaluate the performance of GPT-4V by recognizing and interpreting diagrams commonly found in the literature on reticular chemistry. In particular, we focus on nitrogen isotherms, powder X-ray diffraction (PXRD) patterns, thermogravimetric analysis (TGA) curves, nuclear magnetic resonance (NMR) and infrared (IR) spectra, and various plots including scatter plots, bar charts, 2D and 3D molecular structures to see if GPT-4V can adequately explain these We will also focus on synthesis schemes, microscopy, andWe also analyze experimental images, including synthetic schemes, microscopy, and scanning electron microscopy (SEM) images. The figure below is an example.

When prompted for a detailed description for each figure, GPT-4V not only accurately classifies its images, but also demonstrates an impressive ability to talk in depth about specific details, from annotations and axis ranges, to color coding, symbol and line shapes, labels and legends. In addition, they are able to draw inferences based on information from the provided figure captions. This advanced contextual data interpretation and comprehensive analysis underscores GPT-4V's suitability as a powerful AI assistant for image and data mining in the scientific literature.

Prompt design for page content labeling

The purpose of this paper is to verify whether GPT-4V can autonomously browse scientific papers, identify specific information, compile it into a comprehensive data set, and analyze it. Particular focus was placed on important charts showing the physical properties of metal-organic frameworks (MOFs) - nitrogen isotherms, powder X-ray diffraction (PXRD) patterns, thermogravimetric analysis (TGA) curves, crystal structure and topology diagrams, and other gas adsorption isotherms. These are essential for elucidating important properties of chemical compounds, such as permanent porosity, crystallinity, thermal stability, topology, and selectivity for gases. Efficiently extracting information from these diagrams and integrating them within the vast literature has great potential to improve our understanding of structure-property relationships and accelerate the discovery of new compounds.

To achieve this goal, we have designed specific prompts that target the categories described above using GPT-4V. These prompts take into account the possibility of multiple choices on a single page due to the common coexistence of various figures and tables in the scientific literature. In addition, if a particular category is missing, GPT-4V is instructed to clearly indicate its absence. As a result, GPT-4V offers a total of six choices. The development of this prompt is guided by the basic principles of text mining prompt engineering. An overview is shown in the figure below.

GPT-4V Performance Evaluation

Here, each page of selected literature is imaged and analyzed by GPT-4V. Specifically, page images are combined with specially designed text prompts and responses are collected by GPT-4V, allowing for automatic content classification and identification of pages containing plots for further analysis. This process allows GPT-4V to automatically label each page based on content according to a specific response format.

Although copyright restrictions prevented us from sharing actual images, we illustrated this figure content identification process with a representative example that mimics the layout and content of a page from actual published literature. GPT-4V accurately identifies the desired plot on each page, regardless of the complexity of the information, demonstrate the ability to label them.

To evaluate the classification accuracy of GPT-4V, it was compared to the Ground Truth dataset, which contains 6,240 images manually reviewed and labeled by reticular chemistry experts. Results showed a high accuracy of 94% or better for all categories, but accuracy, recall, and F1 scores varied between 87% and 99% for all categories except "other gas adsorption isotherms." The low accuracy in this category could be attributed to inadequate prompt instructions and occasional mislabeling of IR and NMR spectra, suggesting opportunities for further refinement of the specificity of the prompts.

In addition, GPT-4V's performance demonstrates similar accuracy rates in both the web interface and API, proving the consistency of the underlying model.

This automated process offers a variety of operational options with high performance in gathering information from the literature. The number of pages that GPT-4V identified for the presence of nitrogen isotherms, PXRD patterns, and TGA traces in the analysis by confusion matrix shows the amount of data in the vast amount of literature.

In addition, many pages were classified as lacking plots of interest, which may help researchers streamline the process of reviewing certain types of literature plots in the future.

Interpretation of nitrogen isotherm data by GPT-4V

This section examines how GPT-4V can be used for detailed interpretation and analysis of pages featuring nitrogen isotherm plots after successful labeling of page content. We refine the prompting strategy to incorporate additional specific language that guides GPT-4V to recognize nitrogen isotherms and extract and report key information from each plot.

This includes the figure number, compound name, surface area and pore volume values, the presence or absence of hysteresis in the adsorption-desorption curve, the saturation plateau of the isotherm, and the estimated bounding box surrounding the figure.

The key to this approach was to instruct GPT-4V to use only the information available on the page image and "N/A" for data not available. As a result, GPT-4V has shown an impressive ability to efficiently extract these details by analyzing isotherms and their associated axes, legends, and textual content.

To confirm the accuracy of this analysis of GPT-4V, we also manually reviewed over 200 pages of reactions from selected papers, including nitrogen isotherms. A high level of accuracy was observed, particularly in figure numbering, compound name, and porosity analysis. This suggests that GPT-4V probably utilizes optical character recognition (OCR) tools in its image processing capabilities. In addition, GPT-4V's high level of proficiency with text seems to have a positive impact on tasks related to textual information that can be read directly from images.

However, for the other three descriptors, including presence of hysteresis, saturation plateau, and bounding box estimation, the performance was generally satisfactory, ranging from 76.25% to 84.58%. These tasks are more advanced and subtle challenges that require a comprehensive analysis of all image elements. Nevertheless, the overall performance is particularly impressive, and the simplicity with which the researcher can instruct GPT-4V in natural language further underscores the power of this technique.

Accelerating Digital Databases in Reticular Chemistry

Here, we explore the possibility of utilizing GPT-4V to streamline the construction of detailed databases of reticulate compounds. In particular, we identify pages with distinctive nitrogen isotherm plots based on experimental results obtained from literature published by the scientific community, and carefully extract these data, which are usually in non-digital format, using tools such as WebPlotDigitizer. Through this process, the extracted data are systematically compiled and stored in a database. This method provides a real-world example of a collection of nitrogen isotherm data points that exhibit a wide variety of isotherm types and porosity characteristics.

In addition, the CoRE MOF database is utilized to match calculated and experimental results for compounds discussed in the paper, allowing for comparisons between theoretical and experimental values. In this analysis, theoretical values for each compound are plotted against surface and pore volumes obtained from experiments in a scatter plot, revealing general trends among compounds.

The comparison shows that differences exist between theoretical predictions and experimental results, even based on experimentally determined structures. This highlights the risk of relying solely on calculated results in material selection.

Insights from this study suggest the applicability of GPT-4V across a wide range of scientific disciplines, not just reticular chemistry. Effective database construction requires skillful prompt design, and the introduction of innovative tools such as DSPy has the potential to further enhance the research process and accelerate the evolution of natural language processing tools. This advancement is expected to expand the scope of data mining from the literature and further advance the use of AI tools in scientific research.

Summary

This papershows how usefulGPT-4V can be for text, image, and data mining in the field of reticular chemistry. Focusing on GPT-4V's ability to process page images using uniquely designed prompts, the paper successfully identifies and classifies pages that contain the exact information needed. Of particular note is the fact that this approachsuggests that it may be applicable not only toreticular chemistry, but also to other areas of science.

Large-scale language models such as GPT-4V can be "programmed" using the natural language one normally uses, removing the barriers of coding techniques and special model learning to recognize specific charts and plots. This flexibilityhighlights the fact that analytical transitions from, for example, TGA curves to completely different data types, such as water isotherms, can be accomplished with only simple changes in the prompts.

In addition, the integration of advanced platforms such as DSPy is proposed to make the use of GPT-4V even more effective. This will open up new possibilities in scientific data mining and make AI a more accessible and user-friendly tool in the development of scientific knowledge. Thisapproach is expected to greatly improve the efficiency of work in the area of scientific research and open up opportunities to extract even more data from the literature.

Categories related to this article

Takumu: I have worked as a Project Manager/Product Manager and Researcher at internet advertising companies (DSP, DMP, etc.) and machine learning startups. Currently, I am a Product Manager for new business at an IT company. I also plan services utilizing data and machine learning, and conduct seminars related to machine learning and mathematics.

GPT-4V's Usefulness For Reading Scientific Papers Without Programming

Summary

Initial evaluation of GPT-4V performance

Prompt design for page content labeling

GPT-4V Performance Evaluation

Interpretation of nitrogen isotherm data by GPT-4V

Accelerating Digital Databases in Reticular Chemistry

Summary

Libra] A New Multimodal Design Of Large Language Models Using Separate Vision Systems

Libra] A New Multimodal Design Of Large Language Models Using Separate Vision Systems

Construction And Analysis Of The "TruthEval" Dataset To Expose LLM Weaknesses

Construction And Analysis Of The "TruthEval" Dataset To Expose LLM Weaknesses

SportQA, A New Dataset That Measures The Comprehension Of Sports In Large Language Models

SportQA, A New Dataset That Measures The Comprehension Of Sports In Large Language Models

Proposal For A New Evaluation Method For AI Assistants Based On Human Preferences

Proposal For A New Evaluation Method For AI Assistants Based On Human Preferences

The Future Of Music Education, Flute X GPT And LAUI's Potential To Change Large-Scale Language Models

The Future Of Music Education, Flute X GPT And LAUI's Potential To Change Large-Scale Language Model ...

Prediction Of Handball Results For The 2024 Paris Olympics And Explanation Of The Basis For The Prediction Using LLM

Prediction Of Handball Results For The 2024 Paris Olympics And Explanation Of The Basis For The Pred ...