FreqNet] Generic Deep Fake Detection By Learning In Frequency Space
3 main points
✔️ Proposed a method called FreqNet that integrates frequency information and CNN-based features
✔️ Introduces two modules: a high-frequency representation and a frequency convolution layer
✔️ Achieves state-of-thart with a network of only 1.9 million parameters
Frequency-Aware Deepfake Detection: Improving Generalizability through Frequency Space Learning
written by Chuangchuang Tan, Yao Zhao, Shikui Wei, Guanghua Gu, Ping Liu, Yunchao Wei
(Submitted on 12 Mar 2024)
Comments: 9 pages, 4 figures, AAAI24
Subjects: Computer Vision and Pattern Recognition (cs.CV)
The images used in this article are from the paper, the introductory slides, or were created based on them.
Summary
In this study, we proposed FreqNet, shown in Figure 1. In the case of conventional fake detection using a CNN that takes frequency characteristics into account, the image is converted into the frequency domain, and the CNN is trained using this as input to determine whether it is real or fake. On the other hand, FreqNet attempts to design a general-purpose fake detection model by combining convolutional layers and frequency analysis to design a classifier. As a result, we achieved state-of-thart with very few parameters.
Background
Where We Are Now with Deep Fake Detection Technology
In recent years, with the remarkable development of AI technologies such as GANs and diffusion models, it has become possible to generate fake images that can be mistaken for the real thing. However, it must be noted that the increase in the number of fake images that are so highly accurate that they can be mistaken for the real thing may have unforeseen social consequences. In response, various deep fake detection technologies have been studied.
However, most existing deep-fake detection techniques are trained only by images created in a particular domain or generative model. As a result, their detection performance is limited to that domain. This limitation severely hinders their ability to work universally in unknown domains, including unknown generative models and new categories, making the development of generic deepfake detection techniques highly desirable.
Frequency response of images generated by GAN
As is well understood in the context of deep faking, images generated by GANs have a characteristic frequency response. Figure 2 summarizes the results of frequency analysis of images generated by GAN. It shows that the frequency characteristics can indeed be a clue to distinguishing between GAN-generated images and realistic images. On the other hand, a comparison of the frequency characteristics of images produced by different GANs shows that they have similar but different characteristics. These comparisons underscore the difficulty of obtaining a general-purpose deep-fake detection technique based on frequency characteristics.
Proposed method: FreqNet
Problem statement: Toward a general-purpose deep-fake detection technology
In this research, we define a generic deep detection technique as a method that can use only data from a specific domain or generative model to detect faked images from other domains or generative models in a generic manner.
FreqNet
Figure 3 shows a schematic of the FreqNet proposed in this study. Each element is described here.
(a) High-Frequency Representation of Image
As suggested by previous studies, the authors point out that high-frequency (detailed) image distortion is important in distinguishing between real and fake images. Therefore, in order to extract the high-frequency components of an image, the authors proposed the HFRI Block, which extracts only the high-frequency components of an image by applying the Fast Fourier Transform once to the input image, then using a high-pass filter and applying the Fast Inverse Fourier Transform again.
(b) High-Frequency Representation of Feature
To further improve the versatility of the faking detection performance, the authors continued by introducing a mechanism that consistently focuses on high-frequency components in the feature maps extracted by the CNN as an intermediate layer. In particular, as shown in Figure 3(b), a mechanism to extract high-frequency components in the spatial direction $(W,H)$ and the channel direction ($C$) of the feature space was incorporated into the CNN in a manner similar to the HFRI Block shown in (a).
(c) Frequency Convolutional Layer
Most approaches that use image frequency information to train fake classifiers follow the policy of extracting frequency information from the image and using it to train a classifier such as a CNN. However, the authors point out that this approach may cause the classifier to overfit to the specific distortions of the images in the training data. The authors introduced frequency-space learning with the goal of improving the generalization performance of the faking detector. Specifically, the features output by the convolutional layer are transformed into the frequency domain by a fast Fourier transform, followed by a convolution operation on the amplitude and phase spectra, respectively, and an inverse Fourier transform operation to transform them into real space. This is called FCL in this research.
Experimental results
Data-set
The dataset consisted of 18,000 fake images for 20 categories generated by ProGAN. The dataset was trained and validated using a test dataset created by 17 different generative models to verify its generalization performance.Tables 1 and 2 show a comparison of the fake detection model given in this study with several previous models. These results support that FreqNet works well for both test data sets. Also, in many cases, state-of-the-art is achieved with respect to the average value of each test data set.
In addition, the authors mention the number of parameters in the model. Table 3 shows a comparison of the number of parameters and accuracy of several representative models. Interestingly, despite the significantly smaller number of parameters, the models outperform the previous models with respect to accuracy. This suggests that the FreqNet given in this study can classify real and fake far more efficiently than previous models.
In addition, the author performed an ablation analysis for each of the FreqNet components. Each of the components was found to deteriorate in accuracy when removed. Conversely, each component seems to be functioning properly and contributing to improved accuracy.
Finally, the authors performed Class Activate Map (CAM) visualization for several images. The results show that the fake images shown in (a) and (b) respond strongly to local features in the image, while for the real image shown in (c), the entire image responds on average. Interestingly, the dataset used as training data in this study also shows the ability to effectively recognize face jashin, whereas the dataset used as training data in this study consists of cars, cats, chairs, and horses. This result suggests the high versatility of FreqNet.
Summary
The authors proposed FreqNet as a lightweight model for the generic detection of fake images created by various generative models. a salient feature of FreqNet is the explicit incorporation of frequency analysis into the network framework. As a result, FreqNet achieves state-of-the-art with far fewer parameters than conventional models. This supports the validity of the policy given in this study.
On the other hand, while the authors focus mainly on fake images created by GANs, it remains to be seen whether FreqNet will also work for images created by other generative models such as diffusion models. It is expected that FreqNet will be deployed for images created by diffusion models and other models in the future in order to gain further versatility.
Categories related to this article