Font Size: a A A

Attack And Defense Based On Neural Network Image Classification Interpretable Algorithms

Posted on:2022-10-19Degree:MasterType:Thesis
Country:ChinaCandidate:Q Q SongFull Text:PDF
GTID:2518306509477514Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Deep neural networks have made great progress in the fields of image classification and recognition.But the "end-to-end" decision logic and working mechanism of the deep model make it a "black box" model that is not understood by human users.People are beginning to study interpretable techniques,expecting to explain these models in an understandable way,so it is important to explain exactly how deep neural networks work.However,a manipulated interpretation may undermine the trust of human users in the interpretation and thus mislead the human user not to trust a reliable network.Therefore,it is significative to evaluate the robustness of existing interpretable algorithms,and improve interpretable algorithms to design effective attacks to simulate various possible threats in the real world.This thesis studies how to attack interpretable algorithms of deep neural networks.In view of the few kinds of interpretable algorithms that can be attacked by the existing methods and the high cost of attacking the activation mapping-based interpretable algorithms,an adversarial noise method is proposed to attack the interpretation of network.This thesis designs the objective function to optimizes the visually imperceptible noise added to a specific area of the input,which aims to highlight the interpretation into the arbitrary specified area without changing the output category of the disturbed image predicted by the network.This thesis proposes two situations of attacking,Single-target attack and Multi-target attack,and the proposed method shows that the state-of-the-art saliency maps,including back propagation-based explanations(Full-Grad,Norm-Grad),and activation mapping-based explanations(Grad-CAM,Guided-Feature-Inversion,Score-CAM,Grad-CAM++,CAM)can be easily attacked,and the form of adversarial noise is simpler.Experiments show that the attack method can be transfered to different interpretable algorithms,and an universal noise attack with higher generalization performance is proposed.For Grad-CAM,this thesis also proposes to attack it in the form of image patches.The attack results are proved in both qualitative and quantitative ways,and obtain the prominent results,which provides a feasible adversarial attack for evaluating the robustness of the interpretation.Furthermore,a metric is proposed to measure the effectiveness of the method.In order to enhance the robustness of the interpretable algorithms,make it has certain defenses to attack,this thesis improves the performance of the explanation using the meta-saliency method on the basis of existing interpretable algorithms.and updates the interpretation in the direction of the target class to improve the discriminability of the interpretation on the target class,and the obtained meta-saliency interpretations are still correct under certain noise attacks and are not easy to be attacked.To demonstrate the effectiveness of the meta-saliency interpretations,this thesis conducts both positive and negative systematic evaluations.In the positive direction,the experiment proves that the improved interpretation heat map with meta-saliency method is more accurate and has better class discrimination performance by using direct visual observation and quantitative evaluation with class sensitivity metric and deletion metric.In the negative direction,this thesis proposes to test the robustness under two attack algorithms by perturbation input,and proves that the meta-saliency interpretations improve the robustness against these attacks.Compared with the aggregate defense method,the experimental results show that the proposed method has better robustness.The meta-saliency method in this thesis can improve many interpretable algorithms(including aggregate interpretation),which has practicability and generalization,and is simpler to implement.
Keywords/Search Tags:Deep Neural Networks, Interpretable Algorithms, Interpretation Attack, Robustness, Meta-Saliency
PDF Full Text Request
Related items