An Evaluation Index To Quantify Social Bias In LLM Is Now Available!

Social Bias 11/12/2024

3 main points
✔️ Proposed a method for directly quantifying social cognition
✔️ Designed three new metrics that can assess social bias (Social bias) present in LLMs
✔️ Comprehensive study using five LLM models to characterize the various social biases present in LLMs Discovery.

Ask LLMs Directly, "What shapes your bias?": Measuring Social Bias in Large Language Models
written by Jisu Shin, Hoyun Song, Huije Lee, Soyeong Jeong, Jong C. Park
(Submitted on 6 Jun 2024 )
Comments: Findings of ACL 2024
Subjects: Computation and Language(cs.CL); Artificial Intelligence(cs.AI); Computers and Society(cs.CY)

code：

The images used in this article are from the paper, the introductory slides, or were created based on them.

Introduction

Social bias (Social bias ) is formed by the accumulation of social perceptions of a target across various identities.

While it is essential to consider social cognition from multiple perspectives across identities to fully understand such social biases in Large Language Models (LLMs), existing research has used only two patterns of assessment methods were.

Indirectly assessing sentiment for artificial statistical identities in LLM-generated text
Evaluated by measuring agreement with a given stereotype

On the other hand, these methods were limited in directly quantifying social bias in different perspectives across identities.

Against this background, this paper proposes a new method for directly quantifying social cognition and designs a new evaluation index that can assess social bias in LLM by aggregating various social cognitions, thereby discovering various characteristics of social bias existing in LLM through a comprehensive survey. The paper will describe the successful paper.

What is Social Bias?

Stereotypes form social perceptions, positive or negative biasessuch as"you are male so you must be strong" or"you are female so you must be weak,"or judgments about a particular group or people belonging to it without any objective basis.

These stereotypes are influenced by factors such as the individual's social identity and beliefs, resulting in a set of social cognitions that are unique to each individual.

Based on the psychological insight that social bias arises from the aggregate social cognition of various individuals because of these factors, this paper defines social bias as the aggregate effect of social cognition, as shown in the figure below.

Methodology

This paper focuses on understanding how social cognition has been shaped by different perspectives on different targets in LLM.

To this end, we propose a methodology to validate the various cognitions of LLMs in a QA format, allowing these cognitions to be directly quantified without additional steps.

To begin with, this paper formally defines social cognition as a persona's liking or disliking of one target more than another.

Here, the set of target identities is T = ( _ti ) _ni=1 and the set of models given to the persona is P = ( _pj ) _mj=0.

Such a definition allows us to capture social bias by measuring the variety of identities for different targets in set T among different personas in set P.

Subsequently, this paper designed three new metrics to measure social bias.

TARGET BIAS(TB)

TARGET BIAS (TB) is defined as follows

Here, the aggregate size of _{TBp→ ti} allows us to quantify the degree of bias that persona p exhibits toward the target of set T, allowing us to measure the overall bias of set T toward the target.

BIAS AMOUNT (BAMT)

BIAS AMOUNT (BAMT) is defined as follows

BAMTmeasures the overall intensity of biased decisions made by p for all targets by averaging_BAMTp→tifor each target in the set T.

PERSONA BIAS(PB)

PERSONA BIASis defined as follows

PBmeasures how much the overall bias of each target in set T has changed bytaking the average of the absolute difference in_TBp→tiscoresbetween pj and p0 after assigning a particular persona pj.

For all of these metrics, smaller absolute values indicate lower bias, while larger values indicate greater bias.

Experiments

In this paper, we have conducted a comprehensive experiment using the three new metrics described above.

Setup

The dataset used in this experiment is theBias Benchmark for QA (BBQ), one of the QA datasets designed to test for LLM bias in the social domain.

The model also uses five LLMs : GPT-3.5-turbo-0613, GPT-4-1106-preview, Llama-2-7B, Llama-2-13B, and Llama-2-70B.

In addition, we entered prompts for each LLM prior to the experiment, referring to prompts from prior studies to assign the personas shown below.

After persona assignment, a QA task was conducted to evaluate the bias contained in each model according to the three evaluation indicators mentioned above.

In order to demonstrate the validity of the proposed assessment index, we also calculated the Bias Score (BS), an existing index measuring social bias, in the same way in this experiment.

Results

The results of the experiment are shown in the graph below.

Here, the X-axis of each heat map represents the domain, the Y-axis represents the model, and Target Bias, Bias Amount, and Persona Bias represent the results of the evaluation indicators described above, respectively.

Compared to the Bias Score (BS), which can only capture one-dimensional aspects of bias, the evaluation index proposed in this paper succeeds in capturing multidimensional aspects of bias.

This experiment demonstrates that measuring the bias associated with the identity of each model can clarify the multidimensional aspects of bias and allow for a deeper analysis of LLM bias.

Summary

How was it?In this issue, we have described a paper that proposed a new method for directly quantifying social cognition and designed a new evaluation index that can assess social bias in LLM by aggregating various social cognitions, thereby successfully discovering various characteristics of social bias existing in LLM through a comprehensive survey. Explanation.

While the results of the experiments conducted in this paper indicate that a detailed quantitative analysis of social bias in LLMs is possible, the following issues remain

Since this paper focuses only on English, bias in multiple languages should also be investigated
Need to demonstrate the effectiveness of this approach in larger models, as it was not possible to study bias across different model sizes to conserve computational resources

We are very much looking forward to future progress in resolving these issues, as it will lead to a safer and more widespread use of LLMs that take all biases into account.

The details of the evaluation indexes and experimental results presented in this article can be found in this paper for those who are interested.

Categories related to this article

田中侑李