Do Backdoor Attacks On Deep Learning Models Work In The Real World?

Backdoor Attack 14/12/2021

3 main points
✔️ Demonstrated a backdoor attack on DNN models using real-world objects
✔️ Successfully conducted physical backdoor attacks on ResNet and other DNN models
✔️ Confirmed that existing defenses against backdoor attacks do not work effectively

Backdoor Attacks Against Deep Learning Systems in the Physical World
written by Emily Wenger, Josephine Passananti, Arjun Bhagoji, Yuanshun Yao, Haitao Zheng, Ben Y. Zhao
(Submitted on 25 Jun 2020 (v1), last revised 7 Sep 2021 (this version, v4))
Comments: Accepted to the 2021 Conference on Computer Vision and Pattern Recognition (CVPR 2021).
Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR); Machine Learning (cs.LG)

code：

The images used in this article are from the paper, the introductory slides, or were created based on them.

first of all

Adversarial attacks on deep neural networks (DNNs) include perturbing data to produce erroneous output, mixing special samples in the training dataset to produce erroneous output for images with specific triggers (data poisoning ) exist.

However, existing attacks require specific digital processing of the input data, and it may not be practical to perform such attacks on models used in the real world. In this article, we introduce a paper that studies physical backdoor attacks triggered by wearable accessories, instead of digital processing attacks as in existing methods.

About setting up a backdoor attack

To begin, we'll discuss setting up a physical backdoor attack.

Existing backdoor attacks are based on the assumption that the attacker has no knowledge of the weights or architecture of the model being trained and can inject a small number of "dirty label" samples into the training data. For the physical backdoor attack, we make two additional assumptions.

An attacker can collect images of a person in the training data wearing some trigger object.
For all classes, Data Poisoning can be performed.

For the latter, it has been verified that Data Poisoning can be performed only for some classes.

On datasets of physical backdoor attacks

Since there is no dataset for physical backdoor attacks, the original paper collects datasets for face recognition.

About Trigger Object

Triggers for physical backdoor attacks include objects that are readily available and vary in size and color, such as colored round stickers, sunglasses, tattoos, white tape, bandanas, and earrings. These trigger objects can also be placed in various locations on the face.

The dataset collected includes 535 clean and 2670 poisonous images from ten volunteers of different races and genders (see the original paper for example images).

About the backdoor attack

The attacker can inject poison data during the training of the model. In the original paper, based on the BadNets method (), for a particular target label $y_t$, an attacker can append $m$ pieces of poison data (including a particular trigger $\delta$) to $n$ clean images that were included in the original dataset. (Here, the backdoor injection rate, denoted by $\frac{m}{n+m}$, is an important metric for measuring the attacker's capabilities.

In this case, the goal when learning the model is expressed by the following equation

where $l$ is the learning loss function (cross-entropy in the proposed method), $\theta$ is the model parameters, $(x_i,y_i)$ is the clean data-label pair, and $(x^{\prime}_j,y_t)$ is the poison data-label pair.

Model training settings

When creating the dataset, we first split the clean dataset into train/test sets in a ratio of 80:20 and inject random poison data into the train set to reach the target injection rate.

The remaining Poison data is used to calculate the attack success rate during testing. Also, due to the small training set, we use transfer learning and data augmentation when training the model (see the original paper for details).

experimental results

In the following experiments, we use three DNN architectures (VGG16, ResNet50, and DenseNet) to verify physical backdoor attacks.

To begin, the VGG16 model performance when injecting triggered data at a specific injection rate is shown below.

The purple line shows the accuracy of the model, and the light blue line shows the success rate of the attack. When using trigger objects other than the earring (far right), the attack is successful without significantly reducing the model accuracy. In addition, the following results are obtained when the attack is performed on three models with an injection rate of 25%.

In general, we found that for non-earrings, backdoor attacks using physical triggers work well.

On the Failure of Physical Backdoor Attacks

Next, we further study the case in which the attack is not effective (earrings), among the various trigger objects used.

To begin with, the CAM (Class Activation Map) of the model for the image with the earrings is shown below.

As shown in the figure, the model places particular emphasis on the face region of the image. Therefore, earrings located outside of the face are difficult to influence the classification results and this may be responsible for the low success rate of the attacks.

In fact, the results of placing other trigger objects inside and outside the face are shown below.

As shown in the table, we found that the physical backdoor attack works better when the trigger object is inside the face.

About the case where the class that can attack is limited

The results for the case where poison data can be injected into only some classes of the data set are as follows.

In this table, we show the results when the attackable classes are limited to only 10 out of 75 classes in the entire data set.

Even in this setting, the attack success rate is high, which reveals the effectiveness of physical backdoor attacks.

On Defending Against Physical Backdoors

The next question that comes to mind is if physical backdoor attacks are effective, can they be defended against?

To address this question, the results of using existing defense methods against backdoor attacks are as follows.

This table shows the percentage of poison data detected by existing defense methods.

In general, because of the difference between digital and physical triggers, existing defense methods do not work very well against physically triggered objects.

summary

Existing backdoor attacks against the DNN model were mainly limited to those involving digital processing triggers.

However, this study shows that physical backdoor attacks triggered by real-world objects can actually work effectively.

This could pose a serious threat to a variety of models operating in the real world, making the development of defenses against physical backdoor attacks a serious challenge.