
LLM From Memory To Retrieval: Theoretical Advantages And Demonstrations Of In-Tool Learning
3 main points
✔️ Model internal memory (in-weight learning) is limited by the number of parameters, limiting knowledge retention
✔️ Learning with external tools (in-tool learning) can reference an infinite number of facts, making it efficient and scalable
✔️ Experiments have shown that in-tool learning prevents performance degradation and can generalize to unknown data Experiments have shown that in-tool learning does not degrade performance and can generalize to unknown data.
Provable Benefits of In-Tool Learning for Large Language Models
written by Sam Houliston, Ambroise Odonnat, Charles Arnal, Vivien Cabannes
(Submitted on 28 Aug 2025)
Comments: Published on arxiv.
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
The images used in this article are from the paper, the introductory slides, or were created based on them.
Summary
This paper is a study that clarifies the theoretical advantages of in-tool learning (LLM), in which LLM utilizes external tools.
Conventional LLM has relied on "in-weight learning," in which knowledge is embedded into parameters at learning time.
However, this method has fundamental limitations.
Although the amount of facts that can be stored by the model increases in proportion to the number of parameters, it is not infinitely scalable, and forgetting and interference occur.
In contrast, in-tool learning, which utilizes external databases and APIs, has theoretically proven that it is independent of the number of parameters in the model and can, in principle, refer to an infinite amount of knowledge.
Furthermore, experiments support the effectiveness of in-tool learning.
The authors argue that it is more efficient and scalable in the long run to learn rules and procedures that utilize tools than to force the storage of factual memories inside the model.
This study is an important result that shows that we should shift our ideology in LLM design from "memory by gigantism" to "coordination with external knowledge".
Proposed Methodology
The authors formally defined the difference between "in-weight learning" and "in-tool learning" on the subject of fact retrieval tasks.
In in-weight learning, the model generates answers directly from input sentences, whereas in-tool learning, the model generates queries to an external database and formats the replies to answer them.
Under this framework, we first derived a theoretical lower bound and proved that in-weight learning can only retain facts in proportion to the number of parameters in the model.
Then, for in-tool learning, they showed that even with a limited number of parameters, an arbitrary number of facts can be accurately recalled through external search.
Furthermore, the authors theoretically constructed that the Transformer structure can implement tool invocation, and proved that the number of required parameters is proportional to the square of the number of attributes.
This theoretical framework rigorously positioned in-tool learning to enable knowledge access beyond capacity constraints.
Experiments
To corroborate the theoretical results, the authors conducted two types of experiments.
First, in a controlled experiment with a small-scale Transformer, comparisons were made using synthesized person data (name, place of birth, date of birth, occupation, etc.).
With in-weight learning, the required parameters increased linearly with the number of data, and beyond a certain size, accurate memory became difficult.
In contrast, with in-tool learning, a clear tipping point appeared after about 1,000 cases, confirming that the model does not directly memorize facts, but rather learns query rules and generalizes them to unknown data.
Second, fine tuning of fact addition was performed on existing pre-trained models such as Llama and SmolLM.
The results showed that the in-weighting method degraded linguistic performance and changed the distribution, while the in-tool method was scalable while maintaining most of its performance.
These results strongly indicate that tool use is efficient and sustainable in practice.
Categories related to this article