| Adversarial attack aims at generating imperceptible noise which will be added to an image so that the image(Adversarial Examples)can fool a classifier.Adversarial attack is mainly divided into two research directions:(1)in white-box setting,the structure and parameters of target model are available to adversary;(2)in the black-box setting,only the final output of DNN can be queried by adversary for optimizing adversarial examples.As the opposite of adversarial attack,adversarial defense seeks to defend against adversarial examples.Adversarial attack and adversarial defense are two sides of the same coin,and jointly advance the study of AI robustness.With the continuous developing of deep learning models in various fields,the threaten of adversarial attack has become increasingly prominent.Adversary can cause models to produce misjudgments and incorrect predictions,thereby damaging the security of systems and posing threats to fields such as facial recognition,finance,and assisted driving.So it is urgent to improve the robustness of models and promote the development and progress of deep learning technology.Therefore,research on adversarial attack and adversarial defense is of great practical significance.This thesis focuses on adversarial attack and adversarial defense,as follows:(1)Proposed an improved Meta Attack based on dynamic fine-tuning settings.Existing methods in the black-box setting take the input-output of target model as the information of gradient estimation.In order to avoid exposing the purpose and identity of the adversary and being limited by the number of model queries,existing black-box attack methods expect to reduce the number of model queries as much as possible.Meta Attack adopts meta-learning mechanism,and designs a model to simulate the target model and output meta-gradients that Meta Attack significantly reduces the number of model queries.However,it queries for ZOOgradients to correct the estimated meta-gradients with a fixed frequency,thereby still leading to massive unnecessary queries.To overcome the limitation,this thesis proposes a Dynamic Meta Attack(DMA)by taking the dynamic changes of the accuracy of the estimated gradients as a starting point.At the beginning of each round of fine-tuning,DMA calculates the difference between meta-gradients and ZOO-gradients.Such distance metric can reflect the accuracy of meta-gradients and be used as a guide for dynamically adjusting the query frequency for ZOO-gradients.In addition,the workflow of dynamic fine-tuning is controlled by a set of parameters,which are easy to adjust.In this way,DMA only launches queries at the critical moment,greatly saving the query resource.Experiments show that the proposed DMA requires much fewer queries than existing methods while maintaining a satisfying attack success rate and image distortion.(2)Propose an adversarial defense method based on feature consistency(Feature Consistency Defense,Fe Con Defense).Existing adversarial defense methods rarely consider how adversarial perturbation affect the feature expression of images.Since different models extract different features from the same sample,it makes sense to explore the differences of adversarial example in the feature space of different models and whether they can help us identify and neutralize adversarial perturbation.In other words,the consistency of adversarial example between different models may be a key factor of adversarial defense.This paper finds significant differences of deep layer’s features extracted by two different models for the same adversarial example,which is called the feature consistency loss(Fe Con Loss).Based on the Fe Con Loss,this paper proposes an adversarial defense method by reducing the Fe Con Loss of adversarial examples to neutralize adversarial perturbation.Experimental results show that the proposed Fe Con Defense can improve the robustness of model. |