Visualizing The "inside Of The Head" Of A Language Model - The Internal Mechanism Of LLMs Revealed By The Knowledge Graph
3 main points
✔️ Extract factual knowledge from LLM latent representations in the form of zero-order predicate logic and visualize its time-series changes in a knowledge graph
✔️ Local analysis reveals entity resolution and inference failures, while global analysis reveals interesting transition patterns
✔️ Provides important implications for improving the reliability and safety of artificial intelligence systems
Unveiling LLMs: The Evolution of Latent Representations in a Temporal Knowledge Graph
written by Marco Bronzini, Carlo Nicolini, Bruno Lepri, Jacopo Staiano, Andrea Passerini
(Submitted on 1 Jul 2021)
Comments: Preprint. Under review. 10 pages, 7 figures
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
code:
The images used in this article are from the paper, the introductory slides, or were created based on them.
Summary
Recently, much promise has been placed on large-scale language models (LLMs), the most sophisticated language comprehension models in artificial intelligence. These models have been found to hold remarkable common sense and factual knowledge. However, how this knowledge is structured and used for reasoning remains a mystery.
This research aims to elucidate the knowledge utilization process of these models by approaching the internal mechanisms of LLMs. Specifically, we will elucidate the factual knowledge that LLMs refer to when judging the truth or falsity of a sentence, analyze how this knowledge transitions within the deep structure of the model, and extract characteristic patterns.
The proposed method uses a technique called activation patching to extract formal knowledge from latent representations of LLMs and visualize it as a knowledge graph in a time-series manner. This is expected to provide important insights into the factual knowledge resolution mechanism of LLMs. Improving the interpretability of language models is an important issue that is also directly related to ensuring the reliability and safety of AI technology.
Proposed Method
The core of our proposed method is to extract factual knowledge from latent representations of LLMs and visualize their chronological evolution (see Figure 1). First, the hidden layer latent representations are extracted from the inference process of the LLM for an input sentence. Next, we use a method for dynamically patching this latent representation during inference on a different input sentence. Specifically, latent representations corresponding to the subjects and predicates of the input sentences are replaced with pre-computed weighted average representations.
By repeating this substitution operation, the factual knowledge internally referenced by the LLM can be extracted step by step. The extracted knowledge is expressed in the form of zero-order predicate logic and is constructed on a time-series knowledge graph. This framework makes it possible to dynamically analyze the factual knowledge resolution process of LLMs.
In particular, quantitative analysis using node embedding reveals patterns of knowledge transitions in each hidden layer. Interesting transitions such as entity resolution in the initial layer, knowledge accumulation in the middle layer, and poor representation in the final layer were observed. Such a method using graph representation is expected to bring new insights into the elucidation of the internal mechanisms of language models.
Experiment
To test the effectiveness of the proposed method, experiments were conducted on two factual verification datasets (FEVER and CLIMATE-FEVER). These data sets contain a variety of factual assertions that need to be judged true or false.
First, we evaluated LLM's task performance (Table 1). The results showed that the FEVER dataset had higher accuracy for true claims and lower recall for false claims, while the CLIMATE-FEVER dataset showed balanced performance. This may be due to the latter requiring commonsense inference.
Next, we performed a local interpretability analysis (Figure 2), visualizing the evolution of factual knowledge decoded from latent representations in each hidden layer for the three claim cases. While the early layers focused mainly on resolving entities and the intermediate layers accumulated subject knowledge, the final layer tended to have a poorer representation of factual knowledge. Failures in multi-hop inference were also evident.
We also conducted a global interpretability analysis (Figure 3) and found a distinctive pattern in LLM factual knowledge transitions. We found that the initial layer tended to focus on entity resolution, the middle layer on subject matter knowledge accumulation, and the final layer on attention diversion.
These results indicate that the proposed method is effective in elucidating the factual knowledge resolution process of LLMs. The analytical approach using knowledge graph representation is expected to bring new insights into understanding the internal mechanisms of language models.
Discussion and Conclusion
The main contribution of this study is the proposal of an end-to-end framework for extracting factual knowledge from latent representations of LLMs and representing their chronological transitions in a knowledge graph. This framework enables us to elucidate the factual knowledge that LLMs refer to when judging the truth or falsity of a claim, analyze the hierarchical transition of that knowledge, and discover distinctive patterns.
The local interpretability analysis (Figure 3) revealed details of the internal mechanisms of LLMs, including entity resolution and multi-hop inference failures. The global analysis (Figure 7), on the other hand, revealed interesting patterns: entity resolution in the initial layer, accumulation of subject knowledge in the middle layer, and poor knowledge representation in the final layer. The poor knowledge representation in the final layer may be due in part to the concentration of attention on contextual examples.
Thus, the proposed method provides new insights into the knowledge utilization mechanism of language models. The knowledge graph-based approach effectively visualizes the internal structure of the model and is expected to lead to improved interpretability. Further development is expected in the future, such as extending the input context.
The results of this research will have important implications for the reliability and safety of artificial intelligence technologies, and the elucidation of the internal mechanisms of LLMs should contribute to solving important issues such as improving the predictability of AI systems and removing biases.
Categories related to this article