IndiBias, A New Dataset For Measuring India-specific Social Biases

Large Language Models 16/08/2024

3 main points
✔️Focusing on India's diverse identities,IndiBiasdeveloped a dataset to quantify stereotypes in language models
✔️ Composed of modified sentence pairs and new sentences that reflect India's unique social context, IndiBias provides a more realistic social perspective
✔️ Expected to promoteequitable AItechnology inIndian society

IndiBias: A Benchmark Dataset to Measure Social Biases in Language Models for Indian Context
written by Nihar Ranjan Sahoo, Pranamya Prashant Kulkarni, Narjis Asad, Arif Ahmad, Tanu Goyal, Aparna Garimella, Pushpak Bhattacharyya
(Submitted on 29 Mar 2024 (v1), last revised 3 Apr 2024 (this version, v2))
Comments: Published on arxiv.
Subjects: Computation and Language (cs.CL)

code：

The images used in this article are from the paper, the introductory slides, or were created based on them.

Summary

Large-scale language models are trained on vast amounts of textual data and have shown excellent performance on many natural language processing tasks. However, recent researchhas revealed the existence of biases and stereotypes innatural language processingdatasets and models. These models risk reproducing harmful biases in a variety of applications and may have adverse effects on specific targets. To address this issue, there is a need to develop high-quality benchmark datasets that measure the extent to which models prioritize stereotypic associations.

India is a country of diverse linguistic, religious, caste, and regional identities. There is an urgent need to develop a bias assessment and mitigation framework appropriate to this diversity. Given India's diverse user base, the impact of language model bias is even more pronounced. Existing benchmark datasets that focus primarily on English and Western culture lack the information necessary to understand and mitigate bias in India's unique context. Furthermore, these datasetshave been found to lack the reliability necessary to accurately measure the extent to whichnatural language processingsystems reproduce stereotypes.

To address these challenges, this paper proposes a new dataset called IndiBias. This dataset is designed to measure and quantify language model biases and stereotypes in the Indian social context, focusing on key Indian social identities such as gender, religion, caste, age, region, appearance, and occupation/socioeconomic status, and their multiplications (gender- Religion, Gender-Caste, Gender-Age) are also included. The languages used are Hindi and English.

Through these efforts, we aim to provide deep insights and concrete solutions to the language model bias problem.

Social Bias in India

India has its own unique social biases, based on diverse social disparities such as caste, religion, and region. Over the years, caste-based prejudice has persisted and remains a problem despite social efforts to eliminate it. Films such as Article 15 (2019), The Kashmir Files (2022), d Masaan (2015), and other entertainment media have highlighted the reality of caste and class-based discrimination. Women from Dalits, Adivasis, non-designated tribes, and backward communities face social prejudices and stereotypes on a daily basis.

In a historical study, de Souza (1977) revealed the existence of various stereotypes of regional subgroups in India and showed a link between regional identity and character traits. More recently, Bhatt et al. (2022) present data supporting this using Wikipedia, the IndicCorp-en corpus, and the linguistic models MuRIL and mBERT.

In addition, social biases and stereotypes are multilayered in nature and include global and geocultural context-specific elements. Global axes of social inequality include gender, age, and appearance. However, these global axes also show variation across different demographics. For example, if we consider the axis of inequality of gender, this has a variety of prejudices and stereotypes that are common among women, but there are also prejudices against women that are specific to the geocultural context, which can vary widely around the world.

For example, a common stereotype is that "women cannot do math" (S1), but depending on the region, "women who wear traditional dress in Rajasthan are considered conservative" (S2), or "women who wear traditional dress in West Bengal are considered cultural messengers" (S3), Stereotypes are shown to be reversed.

With the increasing adoption ofnatural language processingapplications inthe Indian legal, medical, educational, and media sectors, there is a need to build reliable, diverse, and high-quality benchmark datasets to measure the bias of contextual models. Such research is essential to promote equitable use of technology across Indian society.

IndiBias Data Set

The IndiBias dataset is specifically designed to fit India's unique social context. The dataset consists of modified sentence pairs from CrowS-Pairs (an existing benchmark dataset), sentences generated using IndiBias tuples, and template-based sentences created by leveraging the power of large-scale language models.

To capture the unique social context of India, such as region and caste, IndiBias tuples have been introduced. These tuples cover a wide variety of identities, including region, caste, religion, age, gender, appearance, and occupation/socioeconomic status, capturing stereotypes and prejudices that are often overlooked in existing data sets. Each tuple consists of an "identity term" and a "stereotypic attribute," where the identity term refers to a specific social group and the attribute indicates the concept stereotypically associated with that term.

The tuple creation process begins with the use of ChatGPT and InstructGPT to generate positive and negative attributes for each identity term. The generated attributes are evaluated by three annotators to determine if they reflect common stereotypes in Indian society, and those that are recognized as stereotypes by two or more annotators are selected. This approach ensures that the dataset is more realistic and reflects diverse social perspectives.

By using this tuple, humans andlarge-scale languagemodels work together to generate stereotyped sentence pairs, thereby covering a wider range of bias categories.

IndiBias offers a new perspective not found in existing models, helping to better understand social stereotypes and prejudices.

As part of the IndiBias project, with the aim of assessing the bias of a large multilingual language model for seven different social biases: gender, religion, age, caste, disability, appearance, and socioeconomic status, a CrowS-Pairs-style We are developing a dataset,. The original CrowS-Pairs is being adapted to the Indian context and then expanded upon using the IndiBias tuple dataset.

The original CrowS-Pairs dataset contains 1,508 sentence pairs designed to measure social bias in the United States. These sentence pairs are organized in a way that reflects a particular group and its stereotypic attributes, with the second sentence differing slightly from the first in terms of target group and attributes. Categories that were deemed unsuitable for the Indian context were eliminated, and the filtering process focused on categories that were more in line with Indian society, such as gender, age, disability, appearance, and socioeconomic status. During this process, 542 sentence pairs were selected and reviewed by five annotators using NLLB Translation and Google Translate for accuracy after machine translation.

The dataset incorporates an approach where tuples, humans, and large-scale language models work together to generate new stereotypical sentence pairs. Each tuple (a combination of identity and attribute) is designed to have a large-scale language model generate naturally occurring sentences based on it. This allowed the sentence pairs to be modified to fit the Indian social context, generating sentences focused on specific categories such as religion and caste. Finally, these sentences were translated into parallel pairs in Hindi, with the overall religion and caste bias categories accounting for 62.6% and 37.4% of the total, respectively.

The project is more than just a translation effort; it is a rigorous review to ensure that the translated text accurately reflects the intent of the source text, and manual corrections are made as needed to select the appropriate translation for the context. This allows the Indian version of CrowS-Pairs to serve as a more accurate data set that captures the nuances specific to the region.

In addition, the IndiBias dataset examines intersectional bias, which is faced by individuals who belong to multiple minority groups or have complex social identities. This bias refers to the fact that individuals are affected not only by one identity dimension, but also by complex biases resulting from the intersection of multiple social categories. Herewe focus on three main intersectional axes: gender and religion, gender and caste, and gender and age. To quantitatively measure the degree of bias, we use the Sentence Embedding Association Teststo assess the bias in each model.

In this way, the IndiBias dataset provides a data-driven approach to better understand and address intersectional biases in the Indian context. It is a composite dataset consisting of Indian CrowS-Pairs (ICS), India-specific attribute tuples, and breached sentences based on various intersectional axes.

Experimental results

Here wequantify the bias using the benchmark data set, using the model described in the table below.

The results of the analysis, using the IndiBias dataset, showing how the various models exhibit bias, are shown in the table below. For each model, the number of sentence pairs for which the score (S1) exceeds the score (S2) when the label is stereo (denoted as n1) and the number of sentence pairs for which the score (S2) exceeds the score (S1) when the label is anti-stereo (denoted as n2) are tabulated, which is defined as the model bias percentage This is defined as the model bias ratio. We then express (n1 + n2) as a percentage of the total number of sentence pairs.

The closer this percentage is to 100%, the more consistently the model endorses stereotypical statements, while a percentage closer to 0% indicates a preference for anti-stereotypical statements. Ideally, this percentage should be close to 50% for an unbiased model.

In English, Bernice, IndicBERT, and mT5 achieve scores very close to 50 compared to the other models, indicating a balanced performance. In contrast, in Hindi, XLMR scores 52.36, showing a different trend from the bias in English. This suggests that a model that scores equally for various types of bias in English sentences may not similarly reduce bias in Hindi. Notably, mT5 supports anti-stereotypic associations in both English and Hindi.

Overall, in the CrowS-Pairs (ICS) dataset, the model tends to have a greater bias in English than in Hindi. This is likely due to differences in the language-specific pre-training corpus from which the models were trained, particularly in the way stereotypes are captured in the Indian context.In the gender category, we observe that mBART has the least bias in English and Bloom has the least bias in Hindi. On the other hand, for religious bias, the models generally show a stronger bias in English, which may be due to the English pre-training corpus taking a global view of the concept of religious bias.

The paper evaluates gender and religion cross-bias in 10 different multilingual models in English and Hindi, and the results are presented in the table below.The Llama v2 and Mistral models do not include Hindi pre-training data, so scores for these models are not reported. The assessment does not report scores for these models because they do not include Hindi pre-training data. The assessment focuses on two attributes: occupation (career/family) and violence (nonviolence/violence). The career/family bias is a common stereotype about gender, while the violence bias is related to religion.

In particular, the India-specific models IndicBert and Muril have a high career/home bias across genders in both English and Hindi, indicating a more pronounced gender bias in the Indian context compared to Western models. The mGPT also shows a particularly pronounced career/family bias in English sentences. The occupational bias is higher for the Muslim religious women group and somewhat lower for the Hindu women group. Interestingly, the bias between Hindu and Muslim women is higher in the Hindi model, and the violence bias is generally higher for the Muslim group in all models, but even higher in the Hindi model.

The results regarding gender/caste cross-bias are shown in the table below. Most English models show a bias against the female group in terms of comfort. However, Bernice, IndicBert, and Muril show a bias against the upper caste group when comparing gender across castes. Hindi shows a bias against the male group in terms of comfort. When gender is held constant and caste is compared, most models show more comfort for upper caste groups, while mBART shows a bias toward lower castes in both languages.

Bias on the gender/age axis is typically very low in XLMR, although women are generally considered more comfortable in India-specific models. However, this is not the case when comparing older female groups to younger male groups. The Bernice model for Hindi is notable for its higher comfort with men. The younger group is generally considered more comfortable than the older group, and the model's pre-training data is behind these behaviors.

Summary

This paper proposes a new dataset called "IndiBias," whichfocuses on the linguistic and cultural context of India to better understand social biases.We are developing a broad identity and attribute tuple set that encompasses seven different demographic categories, including gender, religion, caste, age, region, appearance, and occupation. These are used to capture positive and negative stereotypes in Indian society.

Using a translation, filter, and correction approach, we have created an Indian version of the CrowS-Pairs dataset in English and Hindi, and have further extended this dataset with manually annotated sentence pairs using a tuple dataset. Using this extended dataset, we conducted a comprehensive analysis of biases in various language models and revealed the presence of crossing biases in the Indian context through analysis using SEAT.

Experiments show the importance of considering the combined effects of multiple dimensions when assessing bias in language models. The paper states that future prospects include incorporating instances of sexual orientation into the Indian CrowS-Pairs and further extending this dataset to multiple Indian languages. It is hoped that further insights will be gained through data from a broader social and cultural context.

Categories related to this article

Takumu: I have worked as a Project Manager/Product Manager and Researcher at internet advertising companies (DSP, DMP, etc.) and machine learning startups. Currently, I am a Product Manager for new business at an IT company. I also plan services utilizing data and machine learning, and conduct seminars related to machine learning and mathematics.

IndiBias, A New Dataset For Measuring India-specific Social Biases

Summary

Social Bias in India

IndiBias Data Set

Experimental results

Summary

Libra] A New Multimodal Design Of Large Language Models Using Separate Vision Systems

Libra] A New Multimodal Design Of Large Language Models Using Separate Vision Systems

Construction And Analysis Of The "TruthEval" Dataset To Expose LLM Weaknesses

Construction And Analysis Of The "TruthEval" Dataset To Expose LLM Weaknesses

SportQA, A New Dataset That Measures The Comprehension Of Sports In Large Language Models

SportQA, A New Dataset That Measures The Comprehension Of Sports In Large Language Models

Proposal For A New Evaluation Method For AI Assistants Based On Human Preferences

Proposal For A New Evaluation Method For AI Assistants Based On Human Preferences

The Future Of Music Education, Flute X GPT And LAUI's Potential To Change Large-Scale Language Models

The Future Of Music Education, Flute X GPT And LAUI's Potential To Change Large-Scale Language Model ...

Prediction Of Handball Results For The 2024 Paris Olympics And Explanation Of The Basis For The Prediction Using LLM

Prediction Of Handball Results For The 2024 Paris Olympics And Explanation Of The Basis For The Pred ...