Improved Generalization Performance With Single-Side Domain Generalization, An Asymmetric Learning Framework Based On Fake And Real Data Distributions!
3 main points
✔️ Propose a novel asymmetric end-to-end Single-Side Domain Generalization (SSDG) framework based on the feature that fake face images (Fake) have a larger distribution variance than real face images (Real).
✔️ Design Single-Side Adversarial Learning and Asymmetric Triplet Loss to achieve different optimizations suitable for Real and Fake, and improve the generalization performance of spoofing detection.
✔️ SOTA on representative datasets performance on a representative dataset.
Single-Side Domain Generalization for Face Anti-Spoofing
written by Yunpei Jia, Jie Zhang, Shiguang Shan, Xilin Chen
(Submitted on 29 Apr 2020)
Comments: AAccepted by CVPR2020 (oral)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
In recent years, face recognition technology has been widely used in our daily life such as login and access control of smart phones. On the other hand, various face spoofing methods (e.g., Print, Replay, 3D Mask) have been reported, which exposes us to significant security risks.
To tackle this problem, various face spoofing detection methods have been studied. Most of the existing state-of-the-art methods achieve high performance on Intra-dataset, but still face challenges on Cross-dataset, where the training dataset (source domain) is different from the testing dataset (target domain).
This is because conventional methods do not take into account the distributional relationships between different domains, and thus biased features are learned for a particular dataset, resulting in insufficient generalization performance for unknown domains.
To address this problem, recent face spoofing detection introduces domain adaptation, which minimizes the distributional mismatch between source and target domains by using unlabeled target data. However, in many real-world scenarios, collecting large amounts of unlabeled target data for training is difficult and expensive, and even information about the target domain is not available.
Therefore, some research has started to work on improving the generalization performance of face spoofing prevention by applying Domain Generalization (DG), which aims to train models using multiple existing source domains.
Traditional DG aims to learn a generalized feature space by adjusting the distribution among multiple source domains. We then assume that features extracted from unknown face images can be mapped close to the shared feature space so that the model can successfully generalize to new domains.
The differences in the distribution of real face images in both the source and target domains are small, and a compact feature space can be learned relatively easily. On the other hand, the diversity of impersonation types and data collection methods makes it difficult to compactly summarize the features of fake face images in different domains.
Therefore, it is difficult to find a generalized feature space for fake face images, which may also affect the classification accuracy of the target domain.
Therefore, even if we achieve a compact feature space for both real and fake faces, as shown in the left side of the figure below (Conventional DG), it will be difficult to learn a new target domain Classifier.
With this background, in this paper, we propose a method to compactly aggregate the features of all real face images while distributing the feature space of fake face images in different domains, as shown in the right side of the above figure (Ours DG), with the constraint to distinguish real and fake face images as much as possible proposed in this paper. This allows us to learn class boundaries (Classifiers) with higher generalization performance.
The figure below shows an overview of the framework. It is divided into two main components.
The first is Single-Side Adversarial Learning. The goal is to learn a model with high generalization performance by collecting real face images from multiple domains (datasets) and learning to generalize them without distinguishing between domains.
The data variance of real face images is considered to be much smaller than that of fake face images. Therefore, it is relatively easy to learn a generalized feature space for real face images, and we believe that we can learn more general identification cues.
First, data from multiple domains (datasets) are divided into real face images (Xr) and "fake" face images (Xf), which are input to Feature Generator for Real Faces (Gr) and Feature Generator for Fake Faces (Gf), respectively, to extract features (Zr,Zf) are extracted.
After this, Single-Side Adversarial Learning is applied to the real image only using Domain Discriminator ( D). At this time, Gradient reversal layer (GRL) is introduced to optimize G and D simultaneously.
In D, the features of a real face image, i.e.Zrwhich is a feature of a real face image, D tries to identify the domain, and G learns not to identify the domain. The loss function (LAda) uses the general cross entropy as follows where YDrepresents the correct label of the domain.
By introducing Single-Side Adversarial Learning, we learn a feature space with high generalization performance from real face images, which enables robust identification. This Single-Side Adversarial Learning is only applied to real face images, and for fake face images with large variance, another approach called Asymmetric Triplet Mining is taken, which is explained next.
The second is Asymmetric Triplet Mining. It trains the model to map real face images more compactly and fake face images more dispersedly.
As explained in the introduction, real face images have high similarity and relatively small variance, so they can be treated as the same group even if they have different domains. However, fake face images have a much larger variance due to the different types of impersonation and data collection methods, making it difficult to learn a generalized feature space as well as real face images.
Here, we take into account the features with large differences in distribution and learn a distributed feature space for fake face images, as shown in the figure below. The fake face images (Fake) in each domain are represented by circles, squares, and triangles, respectively, and the real face images (Real) in each domain are represented by x's of different colors. Asymmetric Triplet Mining separates the Fakes of different domains and aggregates the Real domains. At the same time, all Fakes are separated from Real.
As a result, the features of fake face images can be more distributed in the feature space and the class boundaries can be more generalized
In Asymmetric Triplet Mining, G is optimized with the following loss function (LAsTrip) where xai is the Anchor, xpi is the Positive, and xni is the Negative sample. α is a pre-defined margin.
Normalization of features and weights is also introduced to further improve the generalization capability. Normalization has been found to be effective in the field of face recognition. The criterion of feature weights is very much related to the quality of the image. The varying data collection conditions (e.g., camera quality) in each domain also affect the generalization performance.
Here, we l2-normalize on the output of G to make all features share the same Euclidean distance, further improving the performance of face spoofing prevention. We apply l2 normalization to the weights as well.
The loss function for the entire model is formulated as follows
Since every domain contains a label, the Classifier for face impersonation detection is implemented after G, as shown in the first model overview diagram. Both the Classifier and G for face impersonation prevention are optimized by cross entropy (LCls). All components are trained end-to-end.
In this paper, two different architectures are employed as Feature Generator for comparison. One is MADDG, presented at the 2019 CVPR, and the other is an application of ResNet-18. In the following, these two different architectures are denoted by -M and -R. We also use the following four public datasets.
- OULU-NPU (Notation: O)
- CASIA-FASD (notation: C)
- Idiap Replay-Attack (notation: I)
- MSU-MFSD (notation: M)
In this paper, one data set is randomly selected as the target domain for testing and the other three are the source domains for training. Thus, there are a total of four tests, O & C & I to M, O & M & I to C, O & C & M to I, and I & C & M to O.
Comparison with baseline model
We compare the SSDG proposed in this paper with the corresponding baseline model. As before in the baseline model, we design a model that performs contrasting optimizations for real and fake objects.
In the baseline, after the Feature Generator, we add another Domain Discriminator to perform Adversarial Learning on both real and fake face image features. Also, Asymmetric Triplet Loss is replaced by Triplet Loss, and real and fake are all aggregated together. For the baselines, the two different architectures mentioned earlier are also used and are shown as BDG-M and BDG-R respectively.
The comparison results can be seen in the table below. As we will see later, the performance of BDG-M is at a high level, comparable to the performance of the state-of-the-art MADDG.
The average HTER across all tests for BDG-M and MADDG is 23.09% and 23.06% respectively. On the other hand, SSDG-M has an average HTER of 20.79% across all tests, which is better than both BDG and MADDG.
This indicates that it is difficult to find a generalized feature space for fake face images. In other words, for the task of face impersonation detection, asymmetric optimization of real and fake face images is more effective with better generalization performance.
Comparison with SOTA model
As the chart and table below show, the SSDG outperforms the state-of-the-art model in all four tests.
For models other than MADDG, this is likely because we do not consider the inherent distributional relationships between different domains. Therefore, it extracts only features that are biased for each dataset used for training, which results in a significant performance degradation for unknown datasets.
As for MADDG, Domain Generalization (DG) can be used to extract features that provide more general identification cues, but since the distributional features of real and fake face images are very different, finding a generalized feature space to match is difficult to optimize and somewhat the accuracy is inferior.
Due to the variety of spoofing types and data collection methods in the dataset, the features extracted from fake face images are more widely distributed in the feature space than real face images, and it is not easy to aggregate all of them from different domains.
The proposed SSDG improves the problem by applying asymmetric optimization to the real and fake face images, which is believed to learn a more generalized feature space and improve the performance.
Furthermore, using the ResNet18-based network (SSDG-R) provides a significant improvement over SSDG-M. This indicates that higher performance can be achieved when SSDG is combined with more effective models.
Visualization of SSDG
In the figure below, the Class Activation Map (CAM) is visualized using Grad-CAM.
SSDG shows that it always focuses on the face region and looks for effective cues for identification, rather than domain-specific backgrounds and lighting. This results in a high generalization performance for unknown domains.
Furthermore, as shown in the figure below, we randomly select 200 samples of each category from four databases, visualize the tSNE results, and analyze the feature space learned by SSDG and BDG. It can be seen that SSDG is able to distribute the features of fake face images in the feature space better than BDG. In contrast, the feature distribution of real face images is more compact. Therefore, we can see that SSDG can achieve more appropriate class boundaries and generalize properly to the target domain.
In this paper, we propose a novel end-to-end Single-Side Domain Generalization (SSDG) framework to improve the generalization capability of face spoofing prevention.
SSDG learns a more generalized feature space. Here, the feature distribution of real face images (Real) is made more compact, while the feature distribution of fake face images (Fake) is distributed across the domain. In other words, unlike the existing methods that treat both real and fake face images symmetrically, we asymmetrically optimize each of them differently.
To this end, Single-Side Adversarial Learning and Asymmetric Triplet Loss are designed to aggregate the real face images to a smaller size and separate the fake face images from different domains.
Extensive experiments show that SSDG is effective and can achieve SOTA results on four public datasets.
As mentioned throughout this paper, these results suggest that the feature distributions of real and fake face images are indeed different, and treating them asymmetrically may lead to higher generalization performance for unknown domains.
In the future, it is expected that further studies will be conducted to design other asymmetries, such as separating fake face images according to the type of impersonation (Print, Replay, 3D Mask, etc.) as well as the data set.
To read more,
Please register with AI-SCHOLAR.OR
Categories related to this article