Hostile Attack Techniques By Deleting Information!

Adversarial Perturbation 06/01/2023

3 main points
✔️ Adding noise to the original data to misidentify the DNN model is the traditional adversary attack method
✔️ The proposed method attacks by removing information from the original data
✔️ Verify the robustness of the current defense method compared to other attacks

AdvDrop: Adversarial Attack to DNNs by Dropping Information
written by Ranjie Duan, Yuefeng Chen, Dantong Niu, Yun Yang, A. K. Qin, Yuan He
(Submitted on 20 Aug 2021)
Comments: ICCV 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Image and Video Processing (eess.IV)

code：

The images used in this article are from the paper, the introductory slides, or were created based on them.

first of all

An attack that misidentifies a DNN model is called a hostile attack, and conventionally, the attack is carried out by adding noise to the original data that causes misidentification. In this paper, we propose an approach to attack by removing information from the original data instead of adding noise. This is based on the idea that DNNs, unlike humans, are not yet fully capable of recognizing abstract objects, and that humans can consider data abstracted enough to be identifiable to be some sort of hostile sample. The authors test the extent to which the adversarial samples generated by removing information affect the DNN.

The contributions in this paper can be summarized as follows.

Proposed AdvDrop, a method to generate hostile images by removing information from images.
Validated the effectiveness of AdvDrop in both targeted and non-targeted attacks, showing that the hostile sample generated by AdvDrop is more resilient to current defensive techniques than other attacks
Visualizing deleted information and DNN attention to interpreting hostile samples generated by AdvDrop

proposed method

AdvDrop is a method composed of several parts, as shown in the figure below.

DCT (Discrete Cosine Transform): DCT transforms the input image from the spatial domain to the frequency domain.
Quantization: Quantization is the core process of dropping information by applying a quantization table created based on adversarial loss.
IDCT (Inverse Discrete Cosine Transform): IDCT inversely transforms the image signal from the frequency domain to the spatial domain.
Adversarial loss: the proposed method optimizes the quantization table by minimizing the adversarial loss.

AdvDrop generates the adversary image in the flow shown in the figure above. First, the input image is transformed from the spatial domain to the frequency domain using DCT and then quantization is performed to drop certain frequencies of the transformed image. Then optimization is done by inverse transforming the frequency signal of the image into the spatial domain using IDCT. During optimization, the values of the quantization table are adjusted.

hostile loss

For hostile loss, we use the cross-entropy error.

By minimizing the adversarial loss, the quantization table is optimized to selectively remove information from the input image to malfunction the target model.

quantization

Quantization is done by two operations: rounding and truncation. The former maps the original value to the nearest quantization point, while the latter limits the range of values to be quantized. In general, quantization is an operation described by the following equation

The quantization table corresponds to Δ in the above equation. After performing division on the quantization table, the information is reduced by rounding and truncation.

experiment

After evaluating the perceptual and attack performance of AdvDrop, we evaluate the performance of AdvDrop under different defense techniques. Finally, we analyze the information dropped by AdvDrop along with the model attention.

perceived performance

As you increase the constraint on the quantization table, you can see that the detailed information gradually disappears, as shown in the figure below.

We then compare the adversarial samples generated by AdvDrop against other attack methods. We employed lpips as a perceptual metric to measure how similar two images are in a way that is consistent with human judgment. lpips value represents the perceptual loss, and the lower the value, the better. The figure below shows the perceptual loss calculated by lpips on the y-axis and the percentage change in the size of the resulting image compared to the original image on the x-axis. For example, in the case of AdvDrop-100, the x-axis value shows that an average of 36.32% reduces the size of the hostile image compared to the size of the normal image. On the other hand, for PGD, contrary to AdvDrop, the size of the generated hostile image is larger than the original image. Therefore, in the case of PGD, the x-axis value represents how much the size ratio has increased. As can be seen from the figure, in both settings, the adversary image generated by AdvDrop is more perceptually consistent with the original image compared to PGD, although the relative size ratio changes more than in PGD.

Offensive Performance Rating

Next, we evaluate the performance of AdvDrop for both targeted and non-targeted types. We prepared three kinds of constraint conditions for the quantization table and evaluated each of them. The results are shown in the table below.

As this table shows, relaxing the constraint ε improves the success rate of AdvDrop for both targeted and non-targeted types. when ε is 100, we can achieve almost a 100% success rate.

We can also see from the figure below that in a targeted setting, more steps are required for a successful attack than in a non-targeted setting.

AdvDrop under Defensive Methods

In this section, we evaluate the effectiveness of the proposed AdvDrop in various defense techniques by comparing it with other adversarial attacks. Here we first generate adversarial samples by adversarial attacks such as PGD, BIM, C&W, FGSM, and DeepFool. Then we test different defense methods such as adversarial learning and JPEG compression on these samples and evaluate the strength of these attacks under defense. The results are shown in the following table.

The results show that the proposed method has a higher probability of a successful attack than other attack methods when each defense method is applied.

Visualization and Analysis

We tested where and what information is reduced by AdvDrop in a given image. To test this, we visualized the model's attention span and the amount of information that AdvDrop drops in different regions of a given image. The results are shown in the figure below.

In the first case in this figure, the model focuses primarily on the flower part, and AdvDrop drops both the calyx and the flower part. In the second case, we can see that the model focuses on the penguin's head, but AdvDrop mainly drops the information on the body part, which has rich textural details about the penguin's fur.

summary

In this paper, the authors investigate adversarial robustness from a new perspective and propose a new approach called AdvDrop, which creates adversarial samples by dropping existing details in the image. The authors plan to use other methods to remove information from images in the future, so we look forward to future developments.