Font Size: a A A

Research On Adversarial Defense Methods Against Adversarial Examples For Image Classification

Posted on:2024-06-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:C X ChenFull Text:PDF
GTID:1528307316971569Subject:Cyberspace security
Abstract/Summary:PDF Full Text Request
Deep learning methods can effectively explore the structural patterns of data,uncover the inherent regularities within data,and extend the application scenarios of data.They have been widely used in areas such as computer vision,natural language processing,and speech recognition.However,the phenomenon of adversarial examples has exposed the inherent vulnerability flaws of deep learning models,raising concerns about their security.An adversarial example,generated by adding imperceptible perturbations to the original data,can cause originally high-performing deep models to make incorrect predictions or even fail completely,even though it appears identical to humans compared to the original example.Recently,because of the continuous development of adversarial example technology,a series of security issues such as privacy data leakage and evasion of legal supervision caused by the abuse of its have appeared around the world,which raised confusion and concerns about the prospects of artificial intelligence image applications.Therefore,it is urgent to carry out defense research on image adversarial examples,providing a solid theoretical foundation and effective technical support for the safe implementation of artificial intelligence technologyThe existing adversarial example defense methods for image classification mainly focus on passive defense and active defense.The core of the former is to detect adversarial examples,which avoids inputting adversarial example into the deep classification network by detecting whether the input example is abnormal(adversarial example),so as to protect the model from attacks passively.The latter ensures that the adversarial examples are still correctly classified by actively reinforcing the model and reducing its adversarial vulnerability.However,research on adversarial example defense still faces three key issues:Firstly,how to ensure a high detection success rate without affecting the basic framework of the target model,that is,detection effect and module applicability;Secondly,whether the reinforced classification model can have better defense capabilities and defend against multiple types of adversarial examples,that is,the defense performance and robustness;Thirdly,whether the defended model can resist adversarial attacks from different models and different data,that is,the defense transferability.There are several shortcomings in current adversarial defense methods.Adversarial detection defense usually has defects such as high detection success rate for certain types of adversarial examples,but it is difficult to effectively defend against different types of adversarial attacks.And some detection methods rely on specific network structures,and the original network architecture needs to be modified during detection for defense implementation.Adversarial reinforcement defense methods are mainly implemented through adversarial retraining,but some methods have weak defense performance and cannot effectively resist unknown adversarial examples during the training process.In addition,adversarially reinforced models are still vulnerable to transferable adversarial examples.To address these limitations and advance the exploration of adversarial defense issues in deep classification networks,we discuss the adversarial robustness of deep networks from different perspectives,aiming to effectively detect adversarial samples,improve adversarial defense performance,and ensure the reinforcement of defense robustness and transferability to comprehensively promote the development of the research of deep learning defense against image adversarial examples.This thesis builds upon existing works and introduces improvements.The main research work and contributions are summarized as follows:1.Detection against adversarial attacks with saliency map information:This thesis explores the adversarial detection problem from the perspective of model interpretability and proposes an adversarial example detection method based on salient maps.This method relies on saliency map technology that visualizes the internal prediction process of the model,aiming to guide the adversarial detection framework to pay attention to the difference between the deep neural network in identifying clean examples and adversarial examples,and use this difference to determine whether the input example contains adversarial perturbations.At the same time,preprocessing technologies such as averaging processing and color conversion are used to effectively reduce the false detection rate of the adversarial detection framework.In addition,this method starts from the prediction mechanism of the depth model itself to achieve adversarial example detection,so the detection components do not rely on specific models,resulting in good applicability of this detection method.Experimental results show that this adversarial detection method exhibits a high detection success rate in defending against mainstream adversarial examples,with an average improvement of 10%compared to mainstream detection methods.2.Towards adversarial robustness with multidimensional perturbations:This thesis presents an adversarial defense method based on multi-dimensional adversarial perturbations.The core of this method is to solve the max-min game problem in adversarial training,that is,the stronger the attack performance of the adversarial example,the better the corresponding defense effect.In order to ensure the strong aggressiveness of the perturbation,a self-supervised contrastive learning method is adopted.Firstly,a new contrastive learning objective function is constructed in a highdimensional feature space to guide deep networks in learning strong feature representations of different dimensions(coarse-grained feature dimensions,finegrained batch dimensions).On this basis,the contrast objective function is maximized to generate highly aggressive multi-dimensional adversarial perturbations.Finally,by implementing supervised adversarial training on strong multi-dimensional perturbations,it ensures that deep networks acquire more robust features of data,enabling the model to have better defense against adversarial attacks.At the same time,due to not relying on sample label information when generating strong adversarial examples,more essential features of the data can be obtained,which also makes the method have good defense robustness.The experimental evaluation results show that the adversarial defense method based on multi-dimensional perturbation shows good defense performance against different white-box and black-box adversarial examples.3.Improving adversarial robustness with adversarial augmentations:This thesis proposes an adversarial defense reinforcement method,aiming to improve the adversarial accuracy and standard prediction success rate of the model.Existing adversarial defense methods usually improve adversarial accuracy at the expense of reducing standard prediction accuracy,while they are difficult to defend against adversarial examples of alternative models and require a large amount of labeled data.Therefore,we use a self-supervised adversarial attack method to explore effective data augmentation,and constrains the distance between adversarial augmentations through a new loss measurement function to avoid adversarial augmentation examples from degenerating into ordinary augmented data.Finally,with the help of iterative training under adversarial augmentation,the deep encoder is ensured to have good adversarial robustness.This method is a self-supervised adversarial robust defense method that does not require any label information for robust learning.The learned robust deep encoder can be applied to downstream classification and recognition tasks.The experimental results show that the proposed method has a small decrease in the prediction accuracy of clean samples,but it greatly improves the adversarial robustness of the model.Meanwhile,ARL has a defense effect comparable to that of supervised adversarial training methods when defending against white-box and black-box attacks,and it is significantly better than existing self-supervised adversarial learning technology,which also has higher defense transferability against adversarial examples.4.Self-ensemble adversarial defense with gradient manipulation:This thesis proposes a new self-ensemble adversarial defense method to improve the adversarial robustness of deep networks.Unlike common ensemble adversarial defense methods that require training multiple models and aggregating their outputs to make final decisions,self-ensemble adversarial defense only requires adversarial training of local models to improve the adversarial robustness of the model.Specifically,we regard each batch learning stage in the adversarial training process of self-ensemble defense as a sub-model,and uses gradient manipulation technology to ensure that the gradient information of each sub-model is effective and achieves independent adversarial training and parameters of the sub-model.renew.After the model adversarial training is completed,a deep model with good defense universality is generated through iterative integration of sub-model parameters.Since this defense method only involves self-ensemble learning of the local model and does not need to update the parameters of other models,thus the computational cost is low.Meanwhile,experiments have shown that this defense method has better adversarial defense effectiveness,with a performance improvement of about 4%compared to the current mainstream adversarial reinforcement defense benchmark.
Keywords/Search Tags:Artificial Intelligence Security, Deep Classification Network, Adversarial Example, Detection Defense, Adversarial Defense Enhancement
PDF Full Text Request
Related items