What Are The Data Sets And Evaluation Criteria To Fairly Evaluate Defense Mechanisms?

Adversarial Perturbation 13/07/2021

3 main points
✔️ We find that different images have different robustness to hostile attacks
✔️ Defense mechanisms evaluated on images that are robust, to begin with, do not perform correctly
✔️ Proposed a dataset for fair benchmarking and a set of evaluation criteria

Defense-friendly Images in Adversarial Attacks: Dataset and Metrics for Perturbation Difficulty
written by Camilo Pestana, Wei Liu, David Glance, Ajmal Mian
(Submitted on 5 Nov 2020 (v1), last revised 7 Nov 2020 (this version, v2))
Comments: Accepted by WACV 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
code：

Outline of Research

Recently, there has been a lot of research on adversarial attacks, in which the input data of a machine learning model is manipulated to produce false results. In this kind of research, dataset bias is a problem, especially in the evaluation of defense methods. This is because it is not possible to determine whether the robustness obtained during the evaluation is due to the dataset or to the defense method. In this paper, we propose a method to identify robust data and data for which accuracy recovery is likely to occur by applying defensive methods, and we propose a dataset called ImageNet-R, which is a collection of robust data. In addition, we proposed three metrics to measure the robustness of the data. This allows us to perform unbiased benchmarking of adversarial attack and defense algorithms.

Related research

What is a hostile attack?

An adversarial attack is an attack that misleads the output of a model by tampering with the input data. Since we are discussing image data in this paper, we will discuss image data in the following sections.

Adversarial attacks on image data mislead the output of the model by adding noise to the input image that is indistinguishable to a human observer. The method of finding this noise is the type of attack method. For more details, please refer to this article.

Methods of defending against hostile attacks

Various methods have been proposed to defend against adversarial attacks. Broadly speaking.

hostile learning
Searching for Robust Architectures
Image preprocessing

These three are considered to be the most promising. Adversarial learning aims to create a model that is robust to adversarial attacks by including samples created by adversarial attacks (adversarial samples) when training the network. Although this method is very effective, it is prone to overfitting, and thus there is extensive research on how to solve this problem.

Since there is no unified evaluation standard for any of the defense methods, if the evaluation is done on originally robust data, the performance will be higher than it should be.

Defense-friendly data sets

In this section, we investigate the properties of datasets that are inherently robust to adversarial attacks or effective against defensive methods against adversarial attacks. The authors identified three types of data that are easy to classify.

Easy Image: no perturbations, same classification results for all models
ε-robust Images: all models classified correctly even with perturbation without defense algorithm
Defense-friendly: Accuracy recovery is large when using defense algorithms.

The perturbation here refers to the noise that is added when a hostile attack is performed. The larger the perturbation added, the stronger the attack, but the more the image changes, the easier it is to see with the human eye.

An example of these datasets might look something like this

The ε-robust Images will be the images generated by the attack called PGD on Easy Images that all models correctly classified without any defense algorithm. Since 15554 images were correctly classified when $\epsilon = 0.01$, we treat them as a robust dataset.

Criteria for evaluating the robustness of a data set

In this section, we describe the evaluation criteria for the robustness of the data itself. The authors proposed three criteria for this evaluation.

Adversarial Robust Dataset (ARD)
Adversarial Minimum Perturbation (AMP)
Adversarial Defense-friendly (ADF)

ARD represents the fraction of the dataset that is robust given an attack and a model with perturbation ε. AMP represents the smallest perturbation that makes a particular attack on the model M irresistible. The larger this value is, the easier the defense is; ADF represents the fraction of images in the dataset that can be recovered using the defense against a small ε attack. The restoration here refers to the data that has been incorrectly output by the attack and can now be correctly classified by the defense method.

We created six different subsets (robust and non-robust images) using randomly selected images from the dataset we had. The results of ARD, AMP, and ADR scores applied to these datasets are shown below.

NR is a non-robust image and R is a robust image. As you can see, the ARD score and ADF score are higher for the robust images.

Although we have found these scores to be valid metrics, these scores are computationally expensive, so we need metrics that can be computed more efficiently. Therefore, we solve this problem by creating a model that classifies robust and non-robust using predictions.

In prediction, the results of training with the traditional ML model and the deep learning model are as follows.

The deep learning model used CNN to extract the image features, while the ML model used GLCM, a statistical feature extraction method. It was expected that CNN, a newer method, would provide by far the best accuracy, but the ML model, the best performer, achieved an accuracy of 75 using only statistical features extracted from grayscale images. The grayscale image contains the Y channel from YCbCr. The results show that a model using GLCM features extracted from the Y channel can recognize whether an image is robust or not in the majority of cases. It is important to note that the deep learning model uses RGB images, while the ML model is a grayscale image. Therefore, from the point of view of building a simple predictive model, a model using GLCM features extracted from the Y channel is more appropriate.

Summary

In this paper, we show the existence of defensible images that are resilient to hostile attacks and recover the accuracy of the attacked model more easily than other images. Since defensive methods evaluated on datasets containing many such images will be overestimated, we proposed metrics to evaluate the robustness of the datasets. To reduce the computational complexity of these metrics, we also proposed a method that uses prediction to determine whether an image is robust or non-robust.

The point of view that defensive methods are overestimated if the dataset itself is robust in the first place is very interesting, and it is expected that research on defensive methods will take this point of view into account in the future.