Eliminate Attack Noise! Introducing Two-stream Restoration Network
3 main points
✔️ Proposed TRN, a method to remove attack noise from images under hostile attack.
✔️ TRN uses Adversarial Example and its gradient to infer the source image
✔️ Recorded higher performance than any defense method proposed so far
An Eye for an Eye: Defending against Gradient-based Attacks with Gradients
written by Hanbin Hong, Yuan Hong, Yu Kong
(Submitted on 2 Feb 2022)
Comments: Published on arxiv.
Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
The images used in this article are from the paper, the introductory slides, or were created based on them.
In this work, we proposed to defend against gradient-based adversarial attacks on an image (Adversarial Example) by restoring its original image. For this restoration, we use the Adversarial Example and its gradient. By using this method, we have achieved better performance than previous defense methods. Also, the restored images have certain features for each dataset and we discuss a little about these features.
What is a hostile attack?
Adversarial attacks are attacks that add noise to the input data so that the AI makes incorrect decisions. There are various ways to add noise to the input data, but in this research, we focused on gradient-based attack methods. The attack methods using gradient are as follows.
- Multipurpose Internet Mail Extension
All these methods have in common: they add noise to the input data such that the AI has a higher loss when it identifies correctly. Such noise can be obtained by solving the following equation
Various methods have been proposed to defend against hostile attacks. Some of them remove the noise of the input data under hostile attack as in this research, and some of them include the data under hostile attack in the learning process, such as adversarial training. In this research, we compare their methods and performance.
The proposed method (TRN)
It defends against hostile attacks by removing noise added by the hostile attack and restoring the original image. In summary, the following architecture is used to defend against the attack.
To recover the original image, we use the attacked image and the gradient of the image at the time of the attack. Since we are targeting a gradient-based attack method, the gradient, which is the information used in the attack, is also used to recover the original image. However, the attacker knows, but the defender does not know, what is the correct label of the input data. Without this information, the loss function cannot be computed and the gradient cannot be calculated correctly. So the defender computes the gradient for all labels. We call it a gradient map. This gradient map is the input to the model.
As an experiment to verify that this gradient map works correctly, an experiment to visualize the gradient map for CIFAR10 has been performed. The result is shown in the figure below.
This figure shows a gradient map calculated using an image that was originally class 6 but was recognized as class 8 due to a hostile attack. In this figure, the gradient map of the class misrecognized by the hostile attack has features that are not found in other classes. Therefore, it is possible to determine whether an image has been subjected to a hostile attack or not from the information in the gradient map.
To train a TRN, the gradient map and the attacked image are used as input data, and the original image is used as the correct data. The part called Fusion Block in the TRN overview diagram has the following structure.
We use the Fusion Block structure to successfully combine the information in the gradient map with the information in the attacked image. We use a connection called Fusion Connection to share the information. We also use residual connections so that learning can proceed even if many Fusion Blocks are stacked on top of each other.
Performance comparison with other defense methods
For each attack method, we compare the various defense methods that have been proposed so far with the proposed method. The results of the comparison are shown in the table below.
From this table, we can see that our proposed method is more robust against any attack and for any dataset.
Performance differences are caused by the algorithm used during adversarial learning.
Next, the following table examines the attack methods for creating adversarial samples that need to be prepared when training TRNs, and how different attack methods affect the performance of TRNs when they are attacked.
The left side shows the accuracy when the attack is a simple adversarial attack, and the right side shows the accuracy when the proposed method is used. From this table, we can see that the proposed method achieves higher accuracy than adversarial training no matter which attack method is used for training and no matter which attack method is used at that time. The Variation column also shows the difference between the maximum and minimum accuracy, and this column shows that the difference is less than the adversarial training.
Is it really necessary to input both an image and a gradient map?
We use the adversary-attacked image and the gradient map of the image as the input to the TRN to verify if it is really necessary. The results are shown in the table below.
This table compares the scores of the PGD attack on CIFAR10 for each of the four learning processes: without defense, with the gradient map alone, with the attacked image alone, and with the TRN. The table shows that the scores are higher across the board when both types of information are used than when only one of them is used, which indicates that both the image and the gradient map are required as input for TRN.
TRN uses a structure called Fusion Block to mix two inputs (image and gradient map); due to its structure, a Fusion Block can be connected as many times as desired. We examine the benefits of this feature in defense.
The table below shows the relationship between the number of Fusion Blocks and the accuracy of the attack using PGD for each dataset.
From this table, we can see that the number of Fusion Blocks when achieving the highest accuracy is different for each dataset. Thus, we can see that the ability to vary the number of Fusion Blocks as desired helps in defense.
An example of an image restored using TRN is shown in the figure below.
In this figure, the left column shows the original image, the middle column shows the adversary-attacked image, and the right column shows the image recovered by TRN. This figure shows that the noise caused by the adversary attack can be removed by using TRN in CIFAR10. However, in Fashion MNIST and SVHN, we can see that the TRN adds a lattice pattern to prevent the attack. The authors attribute this to the fact that, unlike CIFAR10, Fashion MNIST and SVHN have a simple data distribution, and this kind of gentle restoration method of simply adding a lattice is sufficient.
In this paper, we proposed a method for gradient-based adversarial attacks where the noise of the attack is removed from the attacked image to prevent adversarial attacks. The model used for this noise removal, TRN, uses the input image and its gradient map to determine the noise to be removed. I found the noise removal results for simple data distributions interesting and would like to see further validation by applying this method to various data sets.
Categories related to this article