Font Size: a A A

Research On Feature-Based Adversarial Examples For Security Of Deep Learning

Posted on:2022-12-11Degree:MasterType:Thesis
Country:ChinaCandidate:X M FuFull Text:PDF
GTID:2518306776992559Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
With the increasing maturity of artificial intelligence techniques,deep learning has become one of the fundamental techniques for many modern systems,but its security is not guaranteed and is under serious threat from attacks such as adversarial samples.In image classification tasks,an attacker can induce a deep model to return false predictions with high confidence by simply inserting a small perturbation on the image.The perturbed image is the adversarial sample,and the process is called an adversarial attack.The presence of adversarial samples poses a significant risk to the secure operation of the deep model,especially in security-critical applications,and is likely to have catastrophic consequences if the model is corrupted by an attacker.Therefore,on the one hand,it is crucial to study how to improve the defense capability of the depth model under adversarial attacks for the application of the deep model.On the other hand,studying how to improve the attack strength of an adversarial sample can help motivate more comprehensive and effective adversarial defense methods.In this study,firstly,from the perspective of adversarial sample defense,a novel adversarial training method is proposed to successfully improve the robust defense capability of the deep model regarding adversarial attacks;later,from the perspective of adversarial sample attacks,an enhanced black-box transferable sample generation method is proposed to effectively improve the transferable performance of adversarial samples.The work in this paper is briefly described as follows.1.Adversarial training is one of the most commonly used and productive methods for robust defense today.In this paper,the feature maps of the adversarial samples involved in learning in traditional adversarial training are visualized and compared with the original samples,and it is found that there are significant differences between them in the feature space.Inspired by this observation,this paper proposes the feature-based adversarial training method called FPAT,which divides the network iteration process into two phases,phase 1 constructs the adversarial training samples by maximizing the feature map loss,and phase 2 uses the generated samples for network update.To compute effective adversarial training samples,this paper proposes to use Euclidean distance to measure the distance between the clean feature map and the adversarial feature map as the sample optimization objective function.We design extensive experiments to demonstrate that FPAT can effectively enhance the robustness of the model under various types of attacks,and interpret the results through t-SNE visualization.2.In the above feature-based adversarial training method,the training samples are obtained by maximizing the distance between clean and adversarial samples in the feature space.This paper proposes two improvement points to the process: first,in order to take into account the distance information of the samples in the feature space and the output space,a network prediction loss is introduced in the sample optimization loss to form a more flexible loss to guide the optimization;second,in order to reduce the impact caused by the inhomogeneity of element values in the feature map,the power normalization is introduced to normalize the feature map before the maximum optimization.The experimental data and visualization results demonstrate that the above optimization measures bring additional performance gains to the FPAT model.3.The intermediate layer attack algorithm is one of the methods specifically optimized to counteract the aggressiveness of transferable samples.In this paper,we analyze the shortcomings of the perturbation calculation process in this method and propose the enhanced middle layer attack algorithm called EILA.the improvement points mainly lie in: introducing the random initialization strategy at the beginning of the perturbation optimization,using the enhanced loss function EILAP designed in this paper to guide the calculation direction of the perturbation,and integrating the momentum optimization mechanism in the gradient update process.Among them,the EILAP loss proposed in this paper simultaneously checks the information of EILA samples in the network feature space and output space to provide a more ideal optimization direction for the samples.In this paper,we design a large number of experiments to fully demonstrate that the EILA method can effectively improve the attack strength of the migrated samples.
Keywords/Search Tags:Adversarial Example, Adversarial Training, Image Classification, Transferable Attack, Deep Learning
PDF Full Text Request
Related items