SheetAgent, An LLM Agent That Automatically Performs Spreadsheet-based Tasks, Is Now Available!

ChatGPT 28/05/2024

3 main points
✔️ Constructed SheetRM, a new benchmark for developing and evaluating LLM agents for spreadsheet manipulation
✔️ Proposed SheetAgent, an LLM agent with advanced reasoning and accurate spreadsheet performance
✔️ Comparative experiments show that SheetAgent significantly outperforms existing methods

SheetAgent: A Generalist Agent for Spreadsheet Reasoning and Manipulation via Large Language Models
written by Yibin Chen, Yifu Yuan, Zeyu Zhang, Yan Zheng, Jinyi Liu, Fei Ni, Jianye Hao
(Submitted on 6 Mar 2024 )
Comments: Published on arxiv.
Subjects: Artificial Intelligence(cs.AI); Machine Learning(cs.LG)

code：

The images used in this article are from the paper, the introductory slides, or were created based on them.

Introduction

Data in spreadsheet format plays an important role in science, finance, marketing, and other fields, and is handled primarily by spreadsheet systems.

While these systems are useful for numerical calculations, data analysis, and visualization, these administrative tasks often require a great deal of time and experience.

In order to solve these problems, automation of spreadsheets using Large Language Models (LLMs ) has been attempted in recent years.

However, it was not equipped to handle complex, realistic tasks such as multi-step reasoning and operations with ambiguous requirements, and could only handle tasks involving simple calculations and one-step reasoning.

Against this background, and in order to bridge the gap with real-world challenges in spreadsheets, this paper describes SheetRM, a new benchmark for developing and evaluating LLM agents that manipulate spreadsheets, and SheetAgent, an LLM agent that consists of three modules and is capable of advanced reasoning and accurate spreadsheet The paper describes the proposed SheetAgent, an LLM agent.

SheetRM Benchmark

Multi-Category

In this paper, a new benchmark called SheetRM is constructed to close the gap between simulation and real-world tasks by including tasks with more complex multi-step reasoning and ambiguous requirements that are not included in existing benchmarks.

An overview of SheetRM is shown in the figure below.

In order to include more realistic and complex tasks, SheetRM includes real tasks that require multiple operation categories and multi-step reasoning, as shown in Figure (a).

Specifically, it includes five major types and 36 subtypes of manipulation questions plus four corresponding reasoning tasks, each of which tests both the ability to manipulate the spreadsheet and the ability to reason in the task.

Task Schema

Each task in SheetRM is defined in three parts

Spreadsheet Assets: Since each task consists of multiple spreadsheets, we have the contents of the spreadsheets translated by a single sentence of natural language for the purpose of assessing LLM's internal knowledge.
Task Instruction: Have the user perform a high-level task expressed in natural language. Completing the task requires a series of operations on the target spreadsheet.
Checklist: As shown in Figure (b), each task is paired with a checklist designed to evaluate its completion, with each item in the checklist corresponding to a detailed evaluation of the operation with tailored criteria.

SheetAgent Framework

To quantify the challenges posed by SheetRM, this paper proposes SheetAgent, an LLM-based agent framework.

SheetAgent consists of three main components : Planner, Informer, and Retriever, as shown in the figure below.

Each of these will be explained.

Planner

The following figure shows the prompt template for Planner in SheetAgent.

Planner is used to manipulate spreadsheets, specifically, it uses ReAct-based methods to reason and generate Python code to manipulate the target spreadsheet.

Informer

The figure below shows the Informer prompt template in SheetAgent.

The Informer in SheetAgent is responsible for generating task-specific SQL to execute queries, allowing Planner to more accurately and efficiently identify target spreadsheets and effectively address inference challenges. This allows Planner to more accurately and efficiently identify target spreadsheets and effectively address inferencing challenges.

Retriever

Retriever in SheetAgent serves to advise Planner during task planning and augments error correction by sourcing relevant code from the code repository.

To improve search efficiency, the open-source vector database Milvus (Wang et al., 2021) is used as the code repository.

Experiment

To verify SheetAgent's performance, this paperexperimented with an existing benchmark, the SheetCopilot Benchmark (SCB), and the aforementioned SheetRM.

As a comparative model to SheetAgent, we also employed SheetCopilot, an LLM-based agent framework, and VBA, a method for generating and executing VBA code.

In addition, this experiment employs the following three evaluation indicators in accordance with the existing benchmark, the SCB

Exec@1: Calculate the percentage of tasks that did not raise an exception during task execution
Pass@1: Calculate the percentage of tasks accomplished
SubPass@1: Calculate the percentage of subtasks accomplished for each task

Experimental results at SCB are shown in the table below.

From the table, it can be seen that SheetAgentはSheetCopilotよりも16.8%も高いPass@1と6.8%も高いExec@1 is achieved and outperforms SheetCopilot in terms of performance.

This demonstrates that SheetAgent has more advanced reasoning capabilities and is able to manipulate complex spreadsheets.

In addition, the figure below compares SheetAgent and SheetCopilot on a task that includes a reasoning assignment.

As shown in the figure, SheetCopilot was unable to generate a solution that satisfied the instruction, whereas SheetAgenth was able to correctly identify the intent of the instruction based on the information given and generate a superior solution.

Summary

How was it? In this issue, we discussed SheetRM, a new benchmark for developing and evaluating LLM agents that manipulate spreadsheets to bridge the gap between real-world challenges in spreadsheets, and SheetAgent, an LLM agent that consists of three modules and is capable of advanced reasoning and accurate spreadsheet The paper describes the proposed SheetAgent, an LLM agent that consists of three modules and is capable of advanced reasoning and accurate spreadsheets.

Comparative experiments conducted in this paper confirm that the proposed SheetAgent has the ability to manipulate more complex spreadsheets and perform more complex tasks compared to existing methods.

Although not publicly available on github at this stage, this framework will be open-sourced so that anyone can use it, and we look forward to future developments as this is expected to free many people from the tedious task of making spreadsheets.

The details of the framework and experimental results presented here can be found in this paper for those interested.