Font Size: a A A

Adversarial Attack And Defense Algorithms Based On Mid-and High-level Features Of Deep Models

Posted on:2021-05-16Degree:MasterType:Thesis
Country:ChinaCandidate:Y F LiFull Text:PDF
GTID:2518306503972539Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
Although deep learning has achieved impressive performance in various tasks,researchers found that deep models are actually vulnerable to adversarial examples.Adversarial examples are crafted by adding small perturbations to natural images,but can cause deep models to give wrong prediction,which serves as an attack against models.The existence of adversarial examples threatens the application of deep learning,and has thus attracted wide research interest.In this paper,we explore how to base on the mid-and highlevel features of deep models to design algorithms,with regard to adversarial attack and adversarial defense,respectively.In the aspect of attack,we propose the Generative Transferable Adversarial Attack(GTAA)algorithm.Firstly,to increase the efficiency of generation,GTAA leverages a generator network to produce adversarial perturbations,so as to generate adversarial examples in a single forward pass.Secondly,to enhance the cross-model transferability of adversarial examples and thus to improve the performance of black-box attack,considering the better transferability of intermediate features of deep models,GTAA designs a loss function regarding intermediate features,maximizing the difference between the adversarial examples and the original examples in an intermediate layer.Experiments show that the efficiency of GTAA is higher than gradient-based attack methods,and its generator network converges more rapidly than other generative methods.On the ILSVRC2012 dataset,GTAA shows comparable performance to other attack methods in white-box settings,and achieves the best performance in black-box attacks.We also compare the effects of using different intermediate layers to constrain,and experimental results show that the optimal intermediate layer chosen based on the source model also yields the best performance on the black-box target model.In the aspect of defense,we explore an algorithm that can correct the prediction of adversarial examples solely based on the high-level semantic features of models.Existing work usually depends on input images to withstand adversarial examples.However,in real-world applications,there exist some scenarios where image data and low-level features are not accessible or hard to transmit,such as up-down-stream separate systems or mobile devices,which requires defenses based on high-level features.Through experimental observation,we discover that the logits of adversarial examples and clean examples have different distributions,and that the adversarial attacks reflect the intrinsic relationships between different classes in logits.This provides the possibility of adversarial defenses based on logits.On this basis,we propose the Adversarial Logit Correction(ALC)algorithm,which adopts a two-layer logit correction network as a mapping from logits to correct prediction.In gray-and black-box settings,the correction network can recover the prediction of adversarial examples,while preserving the prediction of clean examples unchanged,purely based on logits.Experiments show that the correction network has certain transferability to different attack methods.By differentiating the correction network,we find that there exist “supporting classes” that have strong responses to attacks,and for an arbitrary pair of attacks,the overlapping ratio of their supporting classes are highly correlated to the transferability of their correction networks.
Keywords/Search Tags:adversarial examples, adversarial attacks, adversarial defenses, logits, transferability
PDF Full Text Request
Related items