FRCSyn Challenge Shows Potential For Face Recognition Technology With Synthetic Datasets (FRCSyn Challenge At WACV 2024: )

Face Recognition 11/12/2023

3 main points
✔️ Face recognition technology has privacy and data imbalance issues, and the use of synthetic datasets has been proposed as a way to address these challenges.
✔️ This new challenge leverages synthetic datasets to evaluate the performance of face recognition models and compares them to real datasets to analyze the usefulness of synthetic datasets.
✔️ The results of the RCSyn Challenge show that synthetic datasets have the potential to reduce the racial bias of face recognition techniques and improve overall performance, and further show that combining synthetic and real datasets can be effective.

FRCSyn Challenge at WACV 2024:Face Recognition Challenge in the Era of Synthetic Data
written by Pietro Melzi, Ruben Tolosana, Ruben Vera-Rodriguez, Minchul Kim, Christian Rathgeb, Xiaoming Liu, Ivan DeAndres-Tame, Aythami Morales, Julian Fierrez, Javier Ortega-Garcia, Weisong Zhao, Xiangyu Zhu, Zheyu Yan, Xiao-Yu Zhang, Jinlin Wu, Zhen Lei, Suvidha Tripathi, Mahak Kothari, Md Haider Zama, Debayan Deb, Bernardo Biesseck, Pedro Vidal, Roger Granada, Guilherme Fickel, Gustavo Führ, David Menotti, Alexander Unnervik, Anjith George, Christophe Ecabert, Hatef Otroshi Shahreza, Parsa Rahimi, Sébastien Marcel, Ioannis Sarridis, Christos Koutlis, Georgia Baltsou, Symeon Papadopoulos, Christos Diou, Nicolò Di Domenico, Guido Borghi, Lorenzo Pellegrini, Enrique Mas-Candela, Ángela Sánchez-Pérez, Andrea Atzori, Fadi Boutros, Naser Damer, Gianni Fenu, Mirko Marras
(Submitted on 17 Nov 2023)
Comment: WACV 2024 Workshops
Subjects: Computer Vision and Pattern Recognition (cs.CV)

code：

The images used in this article are from the paper, the introductory slides, or were created based on them.

Summary

Facial recognition technology is used in a variety of applications such as surveillance cameras, immigration control, online identity authentication, and smartphone unlocking. A great deal of research has been conducted on this topic. Recent advances in deep learning technology, the formulation of margin-based loss functions, the development of large data sets, and various other factors have led to the widespread use of face recognition technology. At the same time, however, various challenges have emerged. In particular, there are various challenges in datasets. For example, there is a privacy issue because of the handling of face images. There are also still many challenges such as limited training datasets, noisy labeling, imbalances associated with different identities and demographic groups, and low resolution.

To overcome these challenges, approaches to constructing synthetic datasets have been actively investigated in recent years. The ability to synthesize highly accurate facial images can eliminate privacy concerns. Resolution and demographics can also be controlled at will.

In this paper, we analyze the results of a face recognition challenge (benchmark test) called the FRCSyn Challenge, which evaluates the performance of a face recognition system trained on synthetic data, in order to verify the usefulness of synthetic data, which has been actively studied in recent years.

In the FRCSyn Challenge, participants were exposed to a synthetic database generated using two SOTA models, "DCFace" and "GANDiffFace," and quantified the performance differences between face recognition models trained using real and synthetic data, with the The goal is to provide insight into the use of synthetic data in face recognition by quantifying the difference in performance of models trained on real data and synthetic data. It also proposes a standard benchmark that is easily replicable for the research community.

Data-set

The FRCSyn Challenge offers participants the opportunity to download the datasets required for the FRCSyn Challenge. Please note that permission to redistribute these datasets has been obtained from the owners.

DCFace and GANDiffFace are provided as synthetic data sets. Below are samples of synthetic data generated by DCFace and GANDiffFace; in the FRCSyn Challenge, these synthetic data are used only when training face recognition models and are intended for realistic operational scenarios.

CASIAWebFace and FFHQ are also provided as training data for the real data. These are the datasets used to train the DCFace and GANDiffFace generative frameworks, respectively. This makes it possible to directly compare the traditional approach of training face recognition models using only real data with the new approach of training face recognition models using synthetic data.

Four real data sets, BUPT-BalancedFace, AgeDB, CFP-FP, and ROF, were used to evaluate the face recognition models.BUPT-BalancedFace is designed to address performance bias among different ethnic groups. It is re-labeled for ethnicity and gender according to the FairFace classifier. Eight demographic groups are considered here, consisting of four ethnic groups (Asian, Black, Indian, and White) and two genders (female and male); AgeDB is a dataset related to age variation, and CFP-FP is a dataset that accounts for facial orientation variation. ROF is also a dataset that takes into account shielding.

These datasets are widely used as benchmarks for face recognition models. Analysis of the generalizability of the proposed face recognition models by using different real data sets for training and evaluation is also incorporated.

FRCSyn Challenge

The FRCSyn Challenge is hosted on Codalab, an open source framework for conducting scientific competitions and benchmarks. The Challenge aims to apply synthetic data to training face recognition models, with a particular focus on two key challenges in current face recognition technology: reducing racial bias and improving overall performance under challenging conditions involving age, changes in facial orientation, shielding, and various races.

To evaluate performance in these aspects, the FRCSyn Challenge considers two tasks, and furthermore, each task consists of two subtasks. The subtasks focus on the training data for the face recognition model, one using "synthetic data only" and the other "combining real and synthetic data." Each subtask is outlined in the table below.

To evaluate the gap between real and synthetic data (GAP), the evaluation index calculates GAP = (REAL - SYN) / SYN based on the difference between actual (REAL) and synthetic (SYN) verification accuracy. Task 1 also defines and evaluates the trade-off (TO = AVG - SD) between the mean (AVG) and standard deviation (SD) of verification accuracy across races.

Result

The table below shows the ranking by task in the FRCSyn Challenge. Task 1 (subtasks 1.1 and 1.2), which focuses on reducing racial bias, is near descending order of TO (trade-offs), with ascending order of SD, i.e., from face recognition models with less bias to those with more. In particular, for subtask 1.1, the top two teams show GAP values of -0.74% and -3.80%, respectively, indicating that synthetic data from DCFace and GANDiffFace have the potential to reduce bias in current face recognition techniques.

In subtask 1.2, where real data are included in the training data, the SD tends to decrease as the AVG increases. Furthermore, as seen in Subtask 1.1, in Subtask 1.2, the top team has a negative GAP value, indicating that the combination of synthetic and real data (the proposed model) outperforms the face recognition model trained on real data alone.

For Task 2, we see that the average accuracy across the entire data set for Subtasks 2.1 and 2.2 is lower than the accuracy achieved for BUPT-BalancedFace in Subtasks 1.1 and 1.2. Also, while learning with synthetic data alone performs well in subtask 2.1, the GAP values for the top five teams are positive, indicating that synthetic data alone is unlikely to fully replace the real data. However, the top two teams in Subtask 2.2 have negative GAP values, indicating that combining synthetic and real data can reduce existing limitations.

Summary

The FRCSyn Challenge is an effort to explore how synthetic data can be applied to face recognition technology. There are currently several problems in this area, and the Challenge analyzes different ways to address them. A number of research groups have participated in the Challenge, each proposing a different approach. These approaches are compared through several smaller tasks (subtasks).

It is hoped that further analysis of these results in future studies will provide insights into high-performance face recognition techniques using synthetic data. the FRCSyn Challenge is considering using the CodaLab platform to run the competition on an ongoing basis. This could lead to the addition of new tasks and subtasks, and further development of the Challenge.

Categories related to this article

Takumu: I have worked as a Project Manager/Product Manager and Researcher at internet advertising companies (DSP, DMP, etc.) and machine learning startups. Currently, I am a Product Manager for new business at an IT company. I also plan services utilizing data and machine learning, and conduct seminars related to machine learning and mathematics.

FRCSyn Challenge Shows Potential For Face Recognition Technology With Synthetic Datasets (FRCSyn Challenge At WACV 2024: )

Summary

Data-set

FRCSyn Challenge

Result

Summary

AVI-Talking" Generates Natural 3D Talking Faces From Audio

AVI-Talking" Generates Natural 3D Talking Faces From Audio

Exploring Facial Expression Recognition Techniques For The Intellectually Disabled Using The MuDERI Dataset

Exploring Facial Expression Recognition Techniques For The Intellectually Disabled Using The MuDERI ...

Diffusion Facial Forgery (DiFF), A New Large-scale Dataset For Face Forgery Detection

Diffusion Facial Forgery (DiFF), A New Large-scale Dataset For Face Forgery Detection

IdentiFace: A Multimodal Face Recognition System That Captures Everything From Emotion To Gender And Its Potential

IdentiFace: A Multimodal Face Recognition System That Captures Everything From Emotion To Gender And ...

How Do Duplicate Images Affect Face Recognition Performance? The Importance Of De-duplication In Face Image Datasets

How Do Duplicate Images Affect Face Recognition Performance? The Importance Of De-duplication In Fac ...

Multi-tasking Face (MTF), A New Facial Image Dataset That Respects Privacy And Can Be Used For Multiple Tasks

Multi-tasking Face (MTF), A New Facial Image Dataset That Respects Privacy And Can Be Used For Multi ...