SynFace" To Build A Face Recognition Model With Generated Face Images
3 main points
✔️ Overcome problems such as label noise and privacy of traditional large datasets by building face recognition models on generated face images
✔️ Introduce Identity Mixup and DomainIntroducing Mixup
✔️ Analyzing the impact of the number of samples/IDs and the number of IDs on the dataset of generated face images, the number of IDs is more important
SynFace: Face Recognition with Synthetic Data
written by Haibo Qiu, Baosheng Yu, Dihong Gong, Zhifeng Li, Wei Liu, Dacheng Tao
(Submitted on 18 Aug 2021 (v1), last revised 3 Dec 2021 (this version, v2))
Comments:Accepted by ICCV 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV)
code:
The images used in this article are from the paper, the introductory slides, or were created based on them.
outline
Recently, large-scale datasets for face recognition have been released, and they have greatly contributed to improving the accuracy of face recognition and face recognition. However, conventional datasets are difficult to collect face images, and problems such as "label noise" and "privacy" have been pointed out.
Label noise is data in which face images are mislabeled. Traditional datasets are mainly collected from face images of celebrities uploaded on the Internet. It automatically collects the face images that appear in the search results of a certain celebrity's name. Therefore, wrong data may be mixed up, for example, face images of different people are included, which adversely affects the accuracy of the face recognition model. In addition, face images on the Web are not publicly available as training data for face recognition models. The person whose face image was collected without their consent and their knowledge could be used for purposes other than the intended use. It is not a very pleasant feeling to think that your face image is being used as training data for some face recognition systems in the world. Naturally, people who have had their face images used without their permission have strongly criticized the use of their face images.
In the paper presented here, we propose a method called "SynFace " that solves these problems by using generated face images instead of those collected from the Internet.
In this SynFace, we also investigate the domain gap between face recognition models trained on real face images and those trained on generated face images. We also investigate how the number of samples per ID and the number of IDs in the training data affect the accuracy of the face recognition model. The paper provides a comprehensive evaluation of face recognition models trained on generated face images.
What is SynFace?
SynFace uses two main techniques: the first is DiscoFaceGAN. Thistechniquemapsrandom noise to five independent latent elements (ID, facial expression, illumination, face orientation, and background), as shown in the figure below, and can generate face images while controlling these elements.
( From Disentangled and Controllable Face Image Generation via 3D Imitative-Contrastive Learning)
The other is Mixup, a data expansion technique that mixes two samples to generate a new sample: x is generated from xi and xj data, and y labels are generated from the corresponding yi and yj labels.
The overall image of SynFace is shown in the figure below. First, a face image is generated in the part called "Mixup Face Generator". Identity Mixup" is introduced here to generate a face image with a new ID from two IDs. As described later, this is introduced to get a diversity of the same ID. Next, "Domain Mixup" is introduced. Here, the generated face images are mixed with the real face images to reduce the difference in data distribution between the real and the generated face images. The above process bridges the domain gap between the generated face images and the real face images so that even when training with the generated face images, high recognition accuracy can be obtained for the real face images.
Domain gap between real and generated face images
In this paper, we propose a method to use generated face images instead of real face images to solve the noise and privacy problems of correct labels. To verify the usefulness of this method, it is necessary to check whether the performance of this method is different from that of using real face images. As mentioned above, the results show that there is still a domain gap between training with generated face images and training with real face images.
Here, we build a face recognition model "RealFace" trained on real face images, and a face recognition model "SynFace" trained on generated face images, and compare their performance. Faces in the Wild), while SynFace is trained on the generated face images "Syn_10k_50" and tested on "Syn_LFW ", which consists of generated face images and has the same statistics as LFW. For these two models, we compare the accuracy (Accuracy) of Verification, which compares two face images to determine whether they are the same person or not.
The results are shown in the table below: SynFace shows a high accuracy of 99.85% for the generated face image dataset "Syn-LFW", but a low accuracy of 88.98% for the real face image dataset "LFW". In other words, if SynFace is applied to real face images as is, the accuracy will be degraded. Conversely, RealFace also shows a high accuracy of 99.18% for LWF, but a low accuracy for Syn-LFW. In other words, there is a domain gap (the difference in performance due to the difference in data distribution between training data and test data).
Incidentally, Syn_LFW uses a model that reconstructs 3D faces to obtain latent elements representing facial expressions, lighting, and facial orientation from the LFW images and applies them toDiscoFaceGAN to generate a dataset with the same data structure as LFW. In addition, Syn_10k_50 is based on sampling 10,000 (=10K) and 50 face images/person (=50 ) according to CASIA-WebFace, which is used for comparison from here. In other words, Syn_N_S, where N is the number of IDs and S, is the number of samples per ID.
What causes the domain gap?
One of the reasons for the domain gap between RealFace and SynFace mentioned above is that there was less Intra-class variation (Intra-class variation) of face images for the same person in the generated face images. The figures below show data fromCASIA-WebFace(left)and Syn_10K_50(right), where each row is a face image of the same person. It is hard to tell at a glance, but if you look closely, you can see that CASIA-WebFace has more types of face blurring, lighting, and face orientation. We evaluated the performance of SynFace on Syn_10K_50 again by expanding the data to increase the variety of blurring and illumination and found that SynFace improved the accuracy from 88.98 to 91.23% compared to LFW.
We also visualize the variance of the facial feature vectors using MDS (see below): comparing Real (green, actual face image) and Syn (generated face image), we can see that has a larger variance and a lower diversity (intra-class distance) of face images for the same person.
Introducing Mixup to DiscoFaceGAN
In SynFace, we found that the diversity (intra-class distance) of face images for the same person is low compared to real face images, resulting in poor recognition accuracy. In this paper, to improve the diversity (intra-class distance) of face images for the same person, we introduce two mixups as described above: one is "Identity Mixup"(red box in the figure below).
In "Identity Mixup", the diversity of face images for the same person is expanded by interpolating two different identities and generating a face image with a new identity. φ is randomly applied from the range of 0.0 to 1.0 in 0.05 intervals.
The other is "Domain Mixup" (red box in the figure below). Here, the generated face image is mixed with the real face image to reduce the performance difference with the real face image.
The composite face image Xs and the real face image XR and their labels are interpolated as follows. Here, too, ψ is randomly applied from the range of 0.0 to 1.0 in 0.05 intervals.
Improve domain gap by implementing Mixup
The table below shows the effect of introducing Identity Mixup (IM). Method" represents the training data, "LFW" represents the accuracy when tested with LFW, and "LFW (w/IM)" represents the accuracy when tested with LFW when Identity Mixup is introduced. Note that the name of the Method is in the format "(number of IDs)_(number of images/ID)". The results show that the introduction of Identity Mixup improves accuracy in all conditions.
This table also shows the results of the impact of changing the configuration of the number of IDs (Width) and the number of images/IDs (Depth).Comparing(a), (b), (c) and(i) in the table and looking at the impact of the number of IDs (Width), from (a) to (c), the accuracy has improved from 83.85% to 88.75%. However, from (c) to (i), the improvement is marginal. As mentioned above, this is due to the lack of Intra-class variation in the face images for the same person. Next, comparing (d) to (i), we see the effect of the number of images/ID (Depth), which also shows a significant improvement in accuracy as the number of images/ID (Depth) increases. Finally, comparing ( a)and (e), where the total number of images is the same, (e) shows higher accuracy, and it can be said that the number of images/ID (Depth) in the datasetplays a more important role than the number of images/ID (Depth). (Similar results are obtained for the comparison of (b) and (f).
The table below shows the effect of introducing Domain Mixup, with Syn_10K_50 as the baseline. For other Methods, Real_( number of IDs)_(number of images/ID ) is a dataset constructed from real face images with (number of IDs) and( number of image/ID), and Mix_( number of IDs)_( number of images/ID) is a dataset mixed with Syn_10K_50, with real face images with (number of IDs) and (number of images/ID). The data set Mix_(number of IDs)_(number of images/ID) represents the data set constructed by the condition of (number of IDs) and (number of images/ID) for Syn_10K_50.
As can be seen from the table, Domain Mixup is effective in all cases: it improves the accuracy over Syn_10K_50, of course, but it is also more accurate than training with only real face images.
summary
In this paper, we use the generated face images to build a new large-scale dataset for face recognition and test its usefulness.
By comparing the performance of face recognition models first trained on real face images and those trained on generated face images, we show that a large domain gap arises. We then show that there is less face image diversity in the dataset of generated face images.
In this paper, we have introduced two mixups, Identity Mixup, and Domain Mixup to solve the above problem. This increases the diversity of face images and reduces the domain gap. We also investigate the impact of the number of IDs (Width) and the number of images/IDs(Depth ) in the dataset on the performance. The results show that increasing the number of IDs (Width ) is more important. Furthermore, the experimental results on Domain Mixup show that mixing even a small number of real face images with the generated face images improves the accuracy of face recognition.
Looking at the trend of accuracy improvement in face recognition, large-scale face image datasets have been released one after another in the past few years, and along with that, significant improvement in accuracy has been observed in verification tests conducted by international organizations such as NIST. On the other hand, however, data noise and privacy issues have emerged, making it difficult to construct and use large-scale datasets.
I believe that the results of this paper can serve as a hint for overcoming the limitations of large data sets. As the technology for constructing face recognition models from generated face images improves in the future, we can expect not only further improvements in accuracy but also face recognition models that take bias and other factors into account. I think this is an area that will attract a lot of attention in the future.
Categories related to this article