A Research On Attacks And Defenses Against Neural Networks

Posted on:2021-05-08

Degree:Master

Type:Thesis

Country:China

Candidate:P C Li

Full Text:PDF

GTID:2428330647951048

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the continuous development of deep learning and the successful application of deep neural networks(DNNs)in the field of image classification,the research on the robustness of DNNs emerges in endlessly,and has received more and more attention in the field of attack and defense of DNNs.DNNs are found to be vulnerable to adversarial attack for the reason of differentiability.This attack constructs adversarial examples by adding small perturbations to the raw input,while appearing unmodified to human eyes but will be misclassified by a well-trained classifier.Researchers can also improve the robustness of the model against this type of attack by utilizing the properties of obfuscated gradients.Therefore,the research on the robustness of neural networks in this paper is divided into attack part and defense part,and has made the following progress:In the field of attack,we focus on the black-box attack setting where attackers have almost no access to the underlying models.To conduct black-box attack,a popular approach aims to train a substitute model based on the information queried from the target DNN.The substitute model can then be attacked using existing whitebox attack approaches,and the generated adversarial examples will be used to attack the target DNN.Despite its encouraging results,this approach suffers from poor query efficiency,i.e.,attackers usually needs to query a huge amount of times to collect enough information for training an accurate substitute model.To this end,we first utilize state-of-the-art white-box attack methods to generate samples for querying,and then introduce an active learning strategy to significantly reduce the number of queries needed.Besides,we also propose a diversity criterion to avoid the sampling bias.Our extensive experimental results on MNIST and CIFAR-10 show that the proposed method can reduce more than 90% of queries while preserve attacking success rates and obtain an accurate substitute model which is more than 85% similar with the target oracle.In the field of defense,we improve the robustness of DNNs by utilizing techniques of Distance Metric Learning.Specifically,we incorporate Triplet Loss,one of the most popular Distance Metric Learning methods,into the framework of adversarial training.Our proposed algorithm,Adversarial Training with Triplet Loss(AT2L),substitutes the adversarial example against the current model for the anchor of triplet loss to effectively smooth the classification boundary.Furthermore,we propose an ensemble version of AT2 L,which aggregates different attack methods and model structures for better defense effects.Our empirical studies verify that the proposed approach can achieve over 85% success rate of defense without sacrificing accuracy on MNIST.Finally,we demonstrate that our specially designed triplet loss can also be used as a regularization term to enhance other defense methods.

Keywords/Search Tags:

Deep Learning, Adversarial Examples, Black-Box Attack, Active Learning, Metric Learning

PDF Full Text Request

Related items

1	Research And Application On Adversarial Examples Under Security Of Deep Learning
2	Design And Implementation Of Image Recognition Adversrial System Based On Black-Box Attack Technology
3	Research On Adversarial Attack And Defense Technology In Machine Learning
4	Research On Image Adversarial Examples Attack And Application Based On Deep Learning
5	Research On The Interpretable Of Adversarial Examples For Deep Learning
6	Generating Adversarial Image Examples For Deep Learning Models
7	Research On Practical Adversarial Examples Generation Based On Deep Learning
8	Research On Feature-Based Adversarial Examples For Security Of Deep Learning
9	Research On Deep Reinforcement Learning Based Text Adversarial Attack Method
10	Research On The Robustness Of Deep Image Classification Models Based On Adversarial Examples