A New Prompting Method For Integrating Strategic Knowledge, Strategic Chain-of-Thought (SCoT), Has Emerged!

Chain-of-Thought 03/02/2025

3 main points
✔️ Propose Strategic Chain-of-Thought (SCoT), a new prompting method to improve LLM inference quality
✔️ Integrating strategic knowledge before generating intermediate inference steps in CoT enables high-quality and stable output
✔️ with multiple data sets Experiments have demonstrated its effectiveness

Strategic Chain-of-Thought: Guiding Accurate Reasoning in LLMs through Strategy Elicitation
written by Yu wang, Shiwan Zhao, Zhihu Wang, Heyuan Huang, Ming Fan, Yubo Zhang, Zhixing Wang, Haijun Wang, Ting Liu
(Submitted on 5 Sep 2024)
Comments: Published on arxiv.
Subjects: Artificial Intelligence(cs.AI); Computation and Language (cs.CL); Human-Compuiter Interaction(cs.HC)

code：

The images used in this article are from the paper, the introductory slides, or were created based on them.

Introduction

Chain-of-Thought (CoT) is an important approach for improving the inference capability of Large Language Models (LLMs) and has been widely used, especially in the natural language field.

On the other hand, this method is not effective for complex inference tasks because the quality of the inference paths generated cannot be consistent and inference performance is unstable.

Against this background, thispaper describes a paper thatproposes a new approach, Strategic Chain-of-Thought (SCoT), which integrates strategic knowledge before generating intermediate inference steps and significantly improves the performance of LLM in complex inference tasks.

Strategic Knowledge

While LLMs tend to generate a variety of CoT paths for the same problem, their quality can vary widely.

For example, as shown on the left side of Figure (a) above, when solving the math problem "compute the sum of all integers s such that -26<s<24? There are two possible approaches.

Use pairs of terms and sum the pairs to generate the final answer.
Use the equi-difference sequence sum formula to calculate the final result directly

Both approaches are effective in solving the problem, but approach 1 is generally less stable in output due to the complexity of intermediate steps, while approach 2 yields higher quality and more stable output.

Methods and principles, such asthe equidistribution formula here,that logically lead the LLM to the desired result arecalled Strategic knowledge and play a very important role in the stability of the CoT generation.

Strategic Chain-of-Thought

This paper proposes a new prompt-based method, Strategic Chain-of-Thought (SCoT), which improves the inference quality of LLM based on Strategic knowledge.

A comparison of conventional CoT and SCoT, the method proposed in this paper, is shown in the figure below.

Traditional CoTs have problems with inefficient inference paths and dependence on external knowledge sources, and the quality of the answers generated has varied.

SCoT, on the other hand, allows Strategic knowledge to be elicited before the model directly generates answers, taking two critical steps in a single query setting

Strategy Elicitation: The model first identifies one of the most effective and efficient ways to solve the problem.
Answer Generation:Apply theidentifiedStrategic knowledge to derive the final answer.

The SCoT prompt template for these two steps is shown below.

By performing the two steps described above with this prompt, the output of responses is more stable and of higher quality than a normal CoT.

Experiment

In order to demonstrate the effectiveness of SCoT, this paper conducted experiments using the following LLMs

Llama3 series(Llama3-8B, Llama3-70B, Llama3.1-8B, Llama3.1-70B)
Llama2 series(Llama2-7B, Llama2-13B, Llama2-70B)
The Qwen2 series(Qwen2-7B, Qwen2-72B)
Mistral-7B
ChatGLM4-9B

In addition, the three prompting techniques of regular CoT, Self-Consistency, and Step Back are used as a baseline for comparison with SCoT.

A total of eight data sets were used, includingMMLU andSQA, which require mathematical and physical reasoning skills.

Experimental results for all models for the three data sets are shown in the table below.

Experimental results showed that SCoT improved performance in most models.

In particular, performance was significantly improved on the Object dataset, which requires spatial inference capabilities, and the experimental results demonstrate the effectiveness of SCoT.

In addition, to investigate the impact of model size on the effectiveness of SCoT, experiments were conducted with three different sized Llama2 models.

Experimental results for the three datasets (MathQA, MHLU, and CSQA) are shown in the table below.

From this experiment, the accuracy improvement was confirmed for all sizes of Llama2 models using SCoT.

On the other hand, the performance gains decreased slightly as the model size increased, indicating that larger models aremore likely to effectively utilize Strategic knowledge.

Automatic SCoT

Additional experiments in this paper were conducted to evaluate whether SCoT prompts can be generated automatically.

To set up the experiment, we entered the SCoT concept into Qwen2-72B, generated the prompt template shown below, and investigated its accuracy when used with the AQuA data set.

The results are shown in the table below.

It can be seen that the accuracy of the manually generated SCoT prompts is lower than that of the CoT, but higher than the CoT.

The results suggest that automatic generation of prompt templates based on SCoT is feasible.

Summary

How was it? In this article,we describedapaperthatproposed a new approach, Strategic Chain-of-Thought (SCoT), which integrates strategic knowledge before generating intermediate inference steps and significantly improves the performance of LLM in complex inference tasks.

The SCoT proposed in this paper is a method that solves the problem of conventional CoTs in which the quality of inference is not stable, suggesting the possibility of significantly improving the performance of LLMs in complex inference tasks.

In addition, the author states that "Future research will focus on evaluating its effectiveness with more complex problems. Future research will focus on evaluating its effectiveness with more complex problems.

The details of the framework and experimental results of the prompts presented here can be found in this paper for those who are interested.

Categories related to this article

田中侑李