Catch up on the latest AI articles

Effect Of Racial Distribution Of Training Data On Bias In Face Recognition Models

Effect Of Racial Distribution Of Training Data On Bias In Face Recognition Models

Face Recognition

3 main points
✔️ Analyze the impact of racial bias using training data with 16 different racial distributions
✔️ Analyze factors of racial bias from various perspectives including Verification Accuracy, Calinski-Harabasz Index, and UMAP
✔️ Factors of racial bias Hoping that understanding will help in building/selecting a more suitable dataset for face recognition

The Impact of Racial Distribution in Training Data on Face Recognition Bias: A Closer Look
written by Manideep KollaAravinth Savadamuthu
(Submitted on 26 Nov 2022)
WACVW 2023
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY); Machine Learning (cs.LG)


The images used in this article are from the paper, the introductory slides, or were created based on them.


 Racial bias in facial recognition/authentication systems has become a social issue in recent years. Several studies have reported that racial bias in facial recognition/authentication systems has resulted in unfair situations for people of certain races. For example, in January 2020, an innocent black man was wrongly arrested in the United States because the facial recognition system incorrectly matched the face of a criminal.

 Although facial recognition/authentication systems have improved significantly in accuracy over the past two decades, the problem of racial, gender and other biases has yet to be resolved. This is considered a serious problem because it can have a significant impact on a person's life, as in the false arrests mentioned earlier.

  The National Institute of Standards and Technology (NIST ), which conducts the face RecognitionVendor Test (FRVT) of facial recognition/authentication models, has analyzed hundreds of algorithms and reported differences in accuracy based on race. It found that False Match Rates(FMR)varied from 10 to 100 times, depending on demographics, and that the difference in FMR was much higher than the difference in False Non-Match Rates (FNMR), which varied up to three times. The report also states that FMR is highest in East Africa, West Africa, and East Asia, and lowest in Eastern Europe. In addition, many face recognition/recognition models by Chinese developers show lower FMRs for East Asians compared to other face recognition/recognition models. Thus, there are many possible causes of bias in face recognition /recognition models, including statistical bias and human bias, and the types of bias vary widely.

 Since most face recognition/authentication models rely on large data sets, it is necessary to investigate the composition of large data sets and its impact on accuracy in order to build robust face recognition/authentication models that work fairly and impartially for everyone.

 Therefore, to find clues to resolving bias in face recognition/authentication models, this paper examines various aspects of the training data, including racial distribution and clustering, similarity within and between races, quality of face images, and their impact on the bias.


 In this paper, we experiment with 16 different training datasets with different racial distributions. These training data were created using two datasets, one being BUPT-BalancedFace. This dataset contains about 300, 000 face images, 7,000 each equally for African, Asian, Caucasian, and Indian, for a total of about 1.25 million face images. Fifteen different datasets are created from these four racial combinations.

 The other is MS1MV3. Unlike BUPT-BalancedFace, MS1MV3 has a more skewed racial distribution, with 14.5% African,6.6% Asian,76.3% Caucasian, and 2.6% Indian. This is used as one type of data set. All 16 types of training data are summarized in the Training Data column in the table below.

 The test data used for the RacialFaces in the Wild (RFW) is a test set for studying racial bias in face recognition. There are four subsets -African, Asian, Caucasian, and Indian- each containing approximately 3,000 individuals with 6,000 image pairs for facial recognition.

Impact of racial distribution of study data on bias

  First, we evaluate the authentication accuracy for African (African), Asian (Asian), Caucasian (Caucasian), and Indian (Indian)using Racial Faces in-the-Wild (RFW) on a face recognition model trained on 16 different training dataAccuracy Metrics The Accuracy Metrics (in %) in the table below show the accuracy of the test data for each race. All" represents the accuracy of the test data for all races combined. STD" is the standard deviation of accuracy.

 We see that the standard deviation (STD) is highest for models trained on one race and lowest for models trained on all races (African+Asian+Caucasian+Indian). Even though the racial distributions of each training data set are similar, the standard deviations are very different. In particular, the standard deviation of the model trained on African face images is low, followed by Indian, and the standard deviation of the model trained on Caucasian (Caucasian) face images is the highest. This is true for both models trained on one race and models trained on three races. Of the models trained on three races, the model trained on African and Indian face images has the lowest standard deviation, followed by the model trained on African and Asian face images. The model trained on Caucasian face images, not including African face images, has the highest standard deviation.

 In addition, although MS1MV3 is data with a biased racial distribution, there is no significant difference in standard deviation compared to the model trained with BUPT-BalancedFace, which has almost no bias in racial distribution. This is likely because MS1MV3 is a much larger data set than BUPT-BalancedFace, and the model trained on MS1MV3 has a lower overall error rate than the model trained on BUPT-BalancedFace. However, while this indicates that the absolute difference in accuracy between races is smaller when trained on a large data set, it does not indicate that the bias is inherently smaller.

Impact of the Calinski-Harabasz Index on Bias

 The table belowshowstheCalinski-HarabaszIndex (CH) for the models trained on the 16 types of training data described earlier. CH-NT represents the Calinski-Harabasz Index for racial data not included in the training data. If only one racial face image exists, the Calinski-Harabasz Index is not calculated. Also, a higher value of CH indicates more significant clustering and bias.

 The CH values help us understand bias in the face recognition/authentication model by measuring cluster distances between and within races. From the table, we can see that the CH values are small for racial data that is the same as the training data, while the CH values are significant for racial data that is not included in the training data. We also see that CH values and standard deviation do not have a monotonic relationship.

Effect of face image quality on the bias

 To understand whether face image quality affects bias, we use a technique called Face Image Quality Assessment (FIQA) to compute image quality scores for face images from training and test data. The higher the score, the higher the quality of the face.

The figure below shows the distribution of image quality scores by race in the training and test data. The table below also shows the median and average image quality scores for the training and test data.

From the figure and table, we can see that African (African) face quality is the highest and Asian (Asian) image quality is the lowest in both the training and test data. This correlates with the above result that the standard deviation of the model trained on African face images is significantly smaller than that of other races. This is likely because the face recognition/authentication model learns facial features more efficiently when trained on high-quality face images than when trained on low-quality face images, helping it to better recognize other races.

Influence of facial features on the bias

  As seen in the preceding table, when the training data includes African facial images, the accuracy for Africans is high and comparable to the accuracy for the other three races. However, when African face images are not included in the training data, the accuracy for Africans is very low, much lower than for the other races. When no African face images are included in the training data, the standard deviation (STD) is higher because the accuracy for Africans is clearly lower.

  The figure below shows the matrix for the average cosine distance between unmatched face pairs for all races. As can be seen from the figure, this can be attributed to the fact that Africans are much less similar to other races while other races have at least one highly similar race.

Effect of decision threshold on the bias

 The figure below shows the decision threshold for the cosine distance for each model when FMR =0.1% for each race in the test data (RFW). The decision thresholds for all models trained on different racial distributions are shown.

 We see that the decision threshold is highest for the race used for training. This would indicate that models trained on a particular race tend to be more confident in recognizing that race, and thus have higher decision thresholds. This is true for all racial distributions. Similarly, in Figure (d), the model trained on MS1MV3 shows the highest decision threshold for whites, while the model trained onBUPT-BalancedFaceshows comparable decision thresholds for all races. This is because the MS1MV3 dataset contains a disproportionately large number of Caucasian faces.


 This paper examines how the racial distribution of training data, differences in facial appearance by race, and differences in image quality affects the racial bias of face recognition/authentication models. As many studies have already shown, variations in the racial distribution of the training data have a significant impact. We have found that, of course, the accuracy of face recognition/authentication is reduced for races not included in the training data. However, even matching the respective racial distributions does not necessarily guarantee a bias-free face recognition/authentication model.

  Not only the variation in the racial distribution of the training data but also the differences in image quality and the facial features of each race have a significant impact on the bias. We also investigate whether the clustering of face images based on race can be used as an indicator to examine bias. In addition, although not discussed in this article, we use UMAP projections to visualize the clustering of face images and investigate the role of gender in clustering.

This paper offers different ideas on what to look for in training data to understand bias in face recognition/authentication models, not just racial distribution. It is hoped that this paper will help us better understand the impact of training data on bias in face recognition/recognition models and help us select and build more suitable data sets and face recognition/recognition algorithms.

If you have any suggestions for improvement of the content of the article,
please contact the AI-SCHOLAR editorial team through the contact form.

Contact Us