Catch up on the latest AI articles

HDA-SynChildFaces, A New Dataset With Synthetic Images, Could Improve Face Recognition Technology For Children.

HDA-SynChildFaces, A New Dataset With Synthetic Images, Could Improve Face Recognition Technology For Children.

Face Recognition

3 main points
✔️ A new dataset, HDA-SynChildFaces, was created by synthesizing facial images of children of various ages.
✔️ Testing the performance of MagFace, ArcFace, and COTS on the new dataset showed that as the age of the faces increased, the scores for judging "different people" increased and the performance decreased.
✔️ Testing the performance of MagFace, ArcFace, and COTS on HDA-SynChildFaces showed that recognition errors tended to be higher for women, blacks, and Asians, and the same was true for children's face recognition.

hild Face Recognition at Scale: Synthetic Data Generation and Performance Benchmark
written by Magnus FalkenbergAnders Bensen OttsenMathias IbsenChristian Rathgeb
(Submitted on 23 Apr 2023)
Comments: Published on arxiv.
Subjects: Computer Vision and Pattern Recognition (cs.CV)


The images used in this article are from the paper, the introductory slides, or were created based on them.


The paper presented here aims to improve the performance of face recognition models for children and provides a synthetic child face dataset. Over the past few years, face recognition systems have found practical applications in a variety of domains, such as immigration control and criminal investigation, but children's face recognition has not received enough attention.

The importance of facial recognition systems for children is now frequently discussed. For example, it can be used by police to locate abducted or lost children. Also, child sexual abuse material (CSAM) has recently become a major problem: 17 million cases were reported in 2019, compared to 29.3 million in 2021, and the number of victims has skyrocketed. These are very large amounts of data, and there are limits to how much can be processed manually, including identifying victims. If a facial recognition system for children were implemented, it would be possible to analyze the seized CSAMs and quickly and accurately identify the victims.

However, the construction of face recognition models generally requires a large amount of training data, and the acquisition and use of such data involves issues such as privacy and human rights. Children, in particular, are a target to be protected, and collecting data on children's facial images is currently very difficult.

Therefore, in the paper presented here, we propose a new method that first synthesizes adult face data, for which there is sufficient training data, and then synthesizes children's face data. This makes it possible to build a face recognition model for children without collecting actual children's face data. The figure below shows a sample of HDASynChildFaces.

Construction of the "HDASynChildFaces" dataset

The paper presented here aims to improve the performance of face recognition models for children and provides a dataset of synthesized child face images. The dataset is created using the following process.

  1. Sampling: Generate adult face images to create initial dataset
  2. Filtering: remove low-quality or unwanted images from the initial data set
  3. Racial balancing: equalizing the racial distribution of the initial data set
  4. Age conversion: Converts an adult face image into a child's face image, which is then classified into various age groups

The first two steps are "1. sampling" and "2. filtering". First, a generative network called "StyleGAN3" is used to generate the initial dataset. Next, the generated images are filtered to remove images based on age and quality. Age filtering uses an age estimation model called "C3AE," which estimates the age of the generated face images and removes the image if its age is lower than a predefined reference age. Next, the images are filtered by image quality. It uses a quality score algorithm called "SER-FIQ", which rates the quality of an image on a scale from 0 to 1. In this score, the closer to 1, the higher the quality. The figure below shows a sample of accepted (a) and rejected (b) images based on the SER-FIQ score.

Next, the boundaries are learned within StyleGAN3's latent space (the internal parameter space used to generate the images) in order to efficiently transform specific attributes (such as gender and age) of the synthesized face images. This is based on the method described in the InterFaceGAN paper. This boundary serves as a dividing line between attributes (e.g., "male" and "female"). To find this boundary, we use a Support Vector Machine (SVM). This boundary allows us to vary the gender attributes of the same person to varying degrees, as shown in the figure below.

This SVM is trained using a large number of images (500,000) generated by StyleGAN3. Each image is classified with a pre-trained model for each attribute (gender, age, etc.). The SVM is then trained using only the most reliable top 10% and bottom 10% of the classification results. Inappropriate data that could not be successfully classified are removed. This is applied to all attributes dealt with in this paper.

In addition, this data set has been adjusted to ensure an even racial distribution. 3. "Racial Balancing." To be able to change certain races to others, as shown in the figure below, we use the learned racial demarcation boundaries described earlier. First, we create a database of images and their latent vectors and classify which race each image belongs to. We then change the race of randomly selected subjects from the most classified race to the least classified race. By repeating this process, we adjust the distribution so that all races are equally distributed.

The figure below shows a sample of the racial distribution before and after the racial balancing procedure. As can be seen from Figure (a) below, 70% of the subjects initially sampled were classified as white, while only 0.5% were black, resulting in a very skewed distribution, which was adjusted to an even distribution as shown in Figure (b) below.

The age is also converted by the SVM method described above. 4. "Age Transformation." However, this method may result in unnatural deformation of the face if the latent vector is not converted appropriately. For example, the figure below shows an example of excessive facial deformation. The first three images (framed in green) realistically depict the same person gradually becoming younger, while the last three images (framed in red) show that when the face is excessively transformed in the age direction, it loses its human-like quality and becomes unnatural.

We use Principal Component Analysis (PCA) as an automatic way to find such problems. The two most important principal components form a distribution, and images that are too far from the center of the distribution are judged to be abnormal. If an image is determined to be anomalous, it is most likely unnaturally deformed and is removed from the data set.

The figure below is a sample of images determined to be abnormal by this method.

The "HDA-SynChildFaces" dataset is composed of 1,652 different subjects that went through the process described above. Here, the 1,652 subjects generated are 20 years of age or older, and for these subjects, an age transformation is performed, resulting in a composite image of five different age groups. This creates six datasets (one adult and five children). Each image in these six datasets has 18 variables, for a total of 1,652 × 6 × (18+1) = 188,832 images synthesized.

Synthesized subjects are also classified as male (M) or female (F). This classification is done to test whether the performance of the face recognition system varies by gender as well as age group. The number of images in each group is shown in the table below. Of the subjects, 40.3% are female and the remaining 59.7% are male, which results in a slight bias. We attribute this bias to the quality filtering process.

The racial distribution of each subject is also adjusted for evenness, as shown in the table below. The dataset is divided into subsets by race to see if the face recognition system has a bias toward certain races and if that bias varies across age groups. We have adjusted for an even distribution of race, but post-processing has resulted in slight inequalities.


The "HDA-SynChildFaces" dataset is a composite collection of child face images with various characteristics, which is used to evaluate how accurately a face recognition system can recognize a child's face. The face recognition systems used for this evaluation are two state-of-the-art open source systems, ArcFace and MagFace, and a commercial-off-the-shelf (COTS) face recognition system.

The results of the experiment, as shown in the table below, consistently show that it is more difficult and error prone to recognize children's faces (especially younger age groups) than to recognize adult faces, across all face recognition systems. This can be seen from the fact that the average of the "Non-mated" portion of the table increases with the younger age group. This indicates that "misperceptions" are increasing. The standard deviation also increases with the age group. This indicates that the accuracy of perceptions is inconsistent and highly variable.

The 'd' score is another important indicator. It indicates how well a system can distinguish between "mated" (matched) and "non-mated" (unmatched) distributions; the higher the score, the better the performance. However, this score also decreases with younger age. This indicates that the face recognition system has difficulty recognizing children's faces. These results indicate that while the face recognition system can recognize adult faces with relatively high accuracy, it has difficulty recognizing children's faces and is prone to misrecognition. In addition, the error rate is particularly high when recognizing the faces of young children, indicating that there is a wide variation in recognition accuracy.

In addition, differences in the performance of face recognition systems by gender and race are evaluated; only results for MagFace, ArcFace, and COTS are presented here, as similar trends are observed.

First, the results for gender show that males in the 20+, 13-16, and 10-13 age groups have higher values of d' (the ability of the face recognition system to distinguish between Mated and Non-mated samples) than females. In the younger age groups, however, this value is slightly higher for females; the mean value for Non-mated is higher for males in all age groups, but the value for Matched is about the same for males and females. This indicates that the face recognition system is slightly better at distinguishing between Mated and Non-mated for males. In addition, for children in the age group 1-4, females show better performance of the face recognition system than males.

Next, the results for race show that across all age groups, white subjects had the highest d' scores, while Latino-Hispanic subjects had slightly higher scores in the adult age group. On the other hand, Black subjects had the lowest d' scores, followed by Indian subjects with lower scores. These results indicate that the performance of the face recognition system varies by race. The results also show that performance decreased for all races in the youngest age group, but the order of lowest to highest performing race did not change. These results show that face recognition systems perform differently depending on gender, race, and age. This means that it is important to design and adjust face recognition systems while taking these factors into account.


In this paper, we propose a new dataset, HDA-SynChildFaces, which consists of composite images of children's faces from various age groups, further balanced based on demographics. We also validate the performance of existing face recognition systems, MagFace, ArcFace, and COTS, with this new dataset.

First, the score that the face recognition system judges as "Mated" has not changed significantly with respect to changes in the age of the face. This indicates that an increase in the age of a face does not significantly affect the ability of the face recognition system to recognize it as the same person. However, the score that determines a face to be Non-mated, i.e., the score that the face recognition system determines to be a "different person," tends to increase with increasing age. In other words, the older a person is, the more likely it is that the facial recognition system will erroneously determine that he or she is a "different person. Similarly, the error rate (how many misrecognitions there are) also increases with age. This indicates that as age increases, the recognition accuracy of the face recognition system tends to decrease.

Gender differences were also observed, with females having higher error rates and more misidentifications than males. However, it also shows that this trend is not necessarily true for the very young group, ages 1-4. In addition, differences by race are also observed, with a trend toward lower recognition accuracy for all races, especially as age decreases. Blacks and Asians have particularly high error and misrecognition rates and lower recognition performance than Caucasians and Latino/Hispanics. This same trend has been observed in facial recognition of children.

Over the past decade, facial recognition has also reached a practical level. And recently, more and more research has been conducted to improve the accuracy of face recognition models for children. It is hoped that this data set will lead to further research and development of face recognition models for children, which will help to save children from being involved in crimes .

  • メルマガ登録(ver
  • ライター
  • エンジニア_大募集!!
Takumu avatar
I have worked as a Project Manager/Product Manager and Researcher at internet advertising companies (DSP, DMP, etc.) and machine learning startups. Currently, I am a Product Manager for new business at an IT company. I also plan services utilizing data and machine learning, and conduct seminars related to machine learning and mathematics.

If you have any suggestions for improvement of the content of the article,
please contact the AI-SCHOLAR editorial team through the contact form.

Contact Us