Machine Minds: Language Models Represent Beliefs Of Self And Others

Topological Data Analysis 25/04/2024

3 main points
✔️ Theory of Mind (ToM) refers to the ability to understand what is going on in the mind of another person and to infer how that person feels or thinks.
✔️ Large-scale language models (LLMs) seem to have the ability to make human-like social inferences, but how they work is not yet well understood.
✔️ By studying the activity of language models, we have found that they have the ability to infer what is going on in the minds of others and themselves.

Language Models Represent Beliefs of Self and Others
written by Wentao Zhu, Zhining Zhang, Yizhou Wang
(Submitted on 28 Feb 2024 (v1), last revised 29 Feb 2024 (this version, v2))
Comments: project page: this https URL
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

code：

The images used in this article are from the paper, the introductory slides, or were created based on them.

Summary

Theory of Mind (ToM) refers to the ability to understand what is going on in another person's mind and to infer how that person feels or thinks. Large-scale language models (LLMs) seem to have the ability to make human-like social inferences, but how they work is not yet well understood. By examining the activity of language models, this study found that the models have the ability to infer what is going on in the minds of others and themselves. Manipulating this can significantly change the model's performance and reveal its role in social reasoning. Furthermore, this ability could be applied to a variety of social problems.

Introduction

Studies have shown mixed results on whether LLMs can understand the human state of mind. Some studies indicate that LLMs are capable of predicting and understanding human states of mind, while others suggest that they are not fully capable of doing so. This argument suggests that LLMs may merely reproduce patterns and may not arise from authentic understanding; to gain a deeper understanding of LLMs' social reasoning abilities, it is important to study their internal representations. Specifically, we will examine whether LLMs can distinguish between the mental states of others and their own mental states. We will also investigate whether the internal representation of LLMs can be modified to reflect the states of mind of others. Finally, we will assess how LLMs' abilities generalize to various social reasoning tasks.

Related Research

Theory of Mind (ToM) is the ability of humans to understand and infer the mental states of others and is the foundation of human social cognition. Even babies show signs of ToM from an early age, and this ability is important for understanding human behavior; a false belief task is used as a way to assess ToM, which requires participants to predict behavior based on the false beliefs of others. There have long been attempts to develop machines with human-like ToM capabilities, most recently using methods such as meta-learning and Bayesian inference, and advances in LLM have accelerated the exploration of ToM capabilities and established various benchmarks. Current research focuses on exploring the internal mechanisms of ToM inference in LLM.

Proposed Methodology and Experiments

Belief Representation in Language Models

Here we seek to understand how artificial intelligence understands the thoughts and beliefs of other people and characters. This is because the ability for humans to enter the minds of others and understand their perspectives and beliefs is critical in social interaction and communication. In this study, we test the ability of language models to infer the beliefs of others by reading sentences. To do this, we will train language models on a large dataset and examine how they understand the thoughts of others. Specifically, we analyze what patterns and features the language model finds in order to infer the beliefs of others from the information in the text.

The study also seeks to visualize how a language model understands someone's beliefs in a sentence. Specifically, it uses graphs and diagrams to show how a language model uses information in a sentence to represent the thoughts of another person. This facilitates understanding of the inner workings of the language model and reveals similarities and differences with human belief understanding. The following linear separability diagram of belief representations provides a visual illustration of a typical representation space.

In (A), the oracle's belief state can be accurately estimated by the linear model, but the protagonist's belief state cannot. The red and blue lines represent the linear decision bounds for the oracle and the protagonist, respectively.
In (B), the belief states of both the oracle and the protagonist can be accurately modeled using a linear model.
(C) further illustrates the decision bounds for joint belief state estimation using a multinomial linear regression model, with the arrows indicating the direction of the probe weights for each class.

Manipulation of belief expression

Here we investigate how language models understand and manipulate the beliefs of others. Specifically, we manipulate the internal representations of language models to change their social reasoning abilities and assess their impact.First, we use a benchmark called BigToM to assess the ability of language models to understand beliefs. This test measures an agent's ability to predict beliefs using a variety of social reasoning tasks. Each task requires the agent to infer beliefs from its actions and perceptions. We then intervene with the internal representation of the language model to investigate how it affects social reasoning ability.

Forward beliefs infer beliefs from actions, forward actions predict future actions, and backward beliefs infer beliefs from actions. These tasks mimic inference patterns used in everyday interactions. Specifically, they manipulate the activation of the language model's attentional head to guide it in a particular direction. This modifies the agent's belief representation and affects the model's performance.

Model performance comparisons in the BigToM benchmark are made under conditions of true beliefs (TB) and false beliefs (FB). Models perform better under truthful beliefs, while they perform worse under false beliefs. In particular, Mistral is biased toward wrong answers. This comparison suggests that the model is deficient in its ability to understand the false beliefs of others.

We investigated the effects of different intervention intensities α on the Forward Belief task using the Mistral-7B. Results show that as intervention intensity increases, "invalid" responses increase and the model is unable to answer in the proper format. In other words, the number of uncertain responses increases and thus the responses are not recognized by the scoring mechanism.

Results of the experiment showed that intervention in a specific direction improved the model's overall ability to comprehend beliefs. In particular, specific direction interventions improved the ability to reason in the case of incorrect beliefs. We further investigated the generality of belief representations across different social reasoning tasks. The results showed that specific orientations generalize to multiple tasks. This suggests that the language model understands common underlying causal variables across different social reasoning tasks.

Conclusion

In this study, we explored the ability of language models (LLMs) to understand the beliefs of others. Our study showed that LLMs can distinguish between the different beliefs of multiple agents and manipulate them to influence the social reasoning process. It also suggested generalization of belief representations across different social reasoning tasks.

Looking ahead, research should first be conducted to improve belief representation during training and its application in more complex AI systems. In addition, research should be conducted on understanding in a wider range of models and more complex situations, with the goal of developing ToM functions that are more in line with human values. This will require ongoing efforts to expand the understanding of ToM in different models and situations. While this research may provide new insights into ToM capabilities of LLMs and contribute to the future development of AI, further research and practice is essential.