Font Size: a A A

Research On Adversarial Attack Algorithm Based On Deep Neural Network

Posted on:2022-02-24Degree:MasterType:Thesis
Country:ChinaCandidate:X H CaiFull Text:PDF
GTID:2518306605967849Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
With the widespread application of deep learning,people question the safety and reliability of deep learning.By studying adversarial examples and universal adversarial perturbations,we can discover the weaknesses of deep learning,improve the robustness and performance of deep neural networks,and promote the interpretability of neural networks.At present,most research focuses on how to make deep neural networks produce false predictions,and rarely observe changes in the characteristics of the network embedding space from the perspective of interpretability.In addition,current universal adversarial perturbations not only loses the relevance to the target network,but also cannot completely separate the adversarial example from the original sample.To solve above problems,this thesis starts from the feature space and interpretable angularity,and proposes the class activation mapping disruptive attack algorithm and universal adversarial perturbations generation algorithm for specific perturbation and general perturbation respectively.This thesis first analyzes the role of class activation mapping in network interpretability,and constructs a feature adaptive metric function starting from embedded features.Based on this function and using the gradient iteration method,a non-target white-box attack algorithm is proposed.That is the feature space-based class activation mapping disruptive attack algorithm.The algorithm disrupts all feature extraction layers through the adaptive feature measurement function,resulting in huge changes in the feature representation of adversarial examples,thereby disrupting the class activation mapping of predictions.Next,to deal with the decrease of sample aggressiveness caused by image size conversion,the class activation mapping disruptive algorithm of smooth filter compression is studied and implemented,which enhances the destructiveness of adversarial examples.Finally,in order to eliminate disturbance traces on adversarial examples to the greatest extent,the class activation mapping disruptive algorithm is optimized under different norm constraints.Experimental results show that the class activation mapping disruptive algorithm proposed in this thesis has strong white-box attack ability,transferability and anti-defense ability.Attacking seven classification networks on ILSVRC2012 data subset,the average attack success rate is8.42% and 0.66% higher than FGSM(Fast Gradient Sign Method)and PGD(Project Gradient Descent),respectively;attacking the target detection network on VOC2007 data set reduces the average accuracy to Below 1%;attacking several face recognition networks on CASIA-Web Face subset,and the attack success rate reaches more than 95%.Next,to strengthen the relevance of universal adversarial perturbations to the target network,to improve the attack success rate on classification networks and quickly generate a durable general disturbance,we propose a universal adversarial perturbation generation algorithm based on generative adversarial network.The algorithm obtains universal perturbations by decoding the output vector of the target classification network,and the output vector is obtained by cyclically activating the classification network with a constant vector between 0 and 1 as the input.The deep convolution generation adversarial network is used as the decoder of the activation information,and the feature adaptive metric function is used to suppression or enhancement feature,and the L2 norm is used to constrain the disturbance size.Experimental results show that the generation algorithm proposed in this thesis attacks seven classification networks on CIFAR-10 and Image Net datasets.The average attack success rate reaches 86% and 96%,respectively,which is 19% and 16% higher than UAP(Universal Adversarial Perturbation).In addition,universal adversarial perturbations can be quickly used for adversarial training to enhance the robustness of neural networks.In terms of theory,this thesis studies the relationship between adversarial attacks and deep neural networks from the perspective of interpretability and feature space,which is helpful to explore the internal mechanisms of deep neural networks and the study of interpretable theoretical work.In terms of application,the work of this thesis can evaluate the reliability of existing neural networks,improve the robustness of networks through adversarial training,and improve the reliability of networks in real scenarios.
Keywords/Search Tags:Adversarial Attack, Deep Neural Network, Image Classification, Adversarial Example, Universal Adversarial Perturbation, Feature Space
PDF Full Text Request
Related items