Next-generation Deep-fake Detection Technology Using Frequency Masks

Fake Detection 29/07/2024

3 main points
✔️ Consider Data Augumentation with Frequency Masking
✔️ Contributes to improving generalization performance of deep-fake detection in general
✔️ Demonstrated effectiveness for all models validated by experimentation

Frequency Masking for Universal Deepfake Detection
written by Chandler Timm Doloriel, Ngai-Man Cheung
(Submitted on 17 Jan 2024)
Comments: Accepted to IEEE ICASSP-2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)

code：

The images used in this article are from the paper, the introductory slides, or were created based on them.

Summary

This study discusses the effectiveness of Data Augmentation using frequency masking in the context of deep fake image detection. In particular, it examines masking in real and frequency space of images and demonstrates its effectiveness in improving the generalization performance of fake detection techniques.

Background

Fake image detection as a social issue

In recent years, with the remarkable development of AI technology based on diffusion models, etc., it has become possible to generate fake images that are difficult even for humans to identify. At the same time, this is a social problem that shakes the reliability of information in modern society. For example, there is a risk that celebrities may fabricate political or social statements. Therefore, there is a need to establish a general-purpose method to detect whether an image is generated by AI or is a real image.

Difficulties in generic fake image detection

Recent years have seen a wide variety of AI-based image generation techniques. The diversity of the generation models makes it difficult to establish a "generic" fake image detection technique. Many studies have shown that AI can be used to detect AI-generated fake images, but their generalization performance is limited. This study proposes frequency masking to improve the generalization performance of fake detection techniques and demonstrates its effectiveness through experiments. Its generality is surprising and can be considered for implementation in whichever model is adopted and will be the basis for the next generation of deep faking technology.

Method

Figure 1 provides an overview of the masking proposed in this study. The following sections describe the respective masking in real and frequency space of the image.

Masking in real space

In the paper, two masking methods in real space are mentioned: (i) Patch Masking and (ii) Pixel Masking. The former is a method in which a square region of $p\times p$ pixels is the target of masking, while the latter is a method in which a single pixel is the target of masking. An example of masking in real space is shown in Figure 1(a). As shown above, the masking process (filling in with black) is applied randomly to each masking region in the image. This idea is the basis for the next masking in the frequency domain.

Masking in frequency space

With reference to the real-space masking described above, the authors introduced a masking process in the frequency domain. The authors set up masking regions for four regions of the Fast Fourier Transform image: (i) low frequency region (Low), (ii) mid frequency region (Mid), (iii) high frequency region (High), and (iv) all region (All). For each target region, the masking process is applied by randomly setting the intensity of a certain frequency to zero at a specific rate.The low-frequency regions correspond to relatively large structures in the image, the mid-frequency regions correspond to image texture and more detailed features, and the high-frequency regions correspond to noise and edges in the image. Although features in the high-frequency region may not be that important, the authors point out that the high-frequency component may be the key to detection because images created by the generative model may contain small artifacts (data distortion).

Experimental results

Data-set

During this study, a dataset consisting of fake images created using ProGAN was used for training. In addition, we validated the trained models on fake images created by various models (ProGAN, CycleGAN, BigGAN, StyleGAN, GauGAN, StarGAN, SITD, SAN, CRN, IMLE, Guided Diffusion, Latent Diffusion, DALL-E, etc.). We validated the trained models with images.

Comparison of each masking process

Figure 2 outlines a comparison of each masking process. Interestingly, a comparison of Pixel Masking and Patch Masking shows that Patch Masking is superior. It also suggests that masking in frequency space is even better than masking in real space.

The authors also discuss the effect of the percentage of masking area on detection accuracy. Table 1 summarizes the faking detection accuracy when varying the percentage of masking treatment.After comparing several percentages, it appears that the highest accuracy is achieved when the masking process is applied at a percentage of 15%. Therefore, the authors set that as the standard percentage for the masking process.

Table 1: The effect of the percentage of areas subjected to masking on the accuracy of fake detection.

Comparison by frequency range subject to masking process

In addition, the authors examined how the accuracy of fake detection is affected by the frequency range in which the masking process is applied. Table 2 summarizes the accuracy comparisons with respect to the frequency range in which the masking process is applied. The results are interesting in that they suggest that there are differences in the frequency bands characterized by the generative model used to create the fake image. At the same time, the results imply that it is difficult to improve the generalization performance of fake image detection.

Table 2: Effect of frequency masking on the accuracy of fake detection depending on the frequency range covered by the frequency masking.

Performance when applied to state-of-the-art

Finally, as a culmination of this research, the authors discuss the effectiveness of combining state-of-the-art (SOTA) with the frequency masking proposed by the authors. Table 3 summarizes the validation results for each validation data set. The average of their accuracy is also shown on the right side of Table 3. Importantly, we can see that the application of the authors' frequency masking, as indicated in red in this table, robustly improves the accuracy of SOTA. This result suggests that the frequency masking proposed by the authors is a data augmentation technique that can improve the accuracy of fake detection in general. In other words, frequency masking results in the model being able to properly select the features that are important in the context of fake image detection without being pulled in by the superficial features of the image.

Table 3. change in accuracy with each model due to frequency masking.

Summary

Motivated by recent improvements in model accuracy through masking processes, the authors discussed the effectiveness of frequency masking in the context of fake image detection. The results suggest that frequency masking generally improves the accuracy of fake detection models for fake images created by various generative models. It also worked robustly regardless of the model employed. The results indicate that frequency masking can serve as a general data augmentation to support general-purpose fake detection tools and is expected to be deployed in a variety of models in the future.