Font Size: a A A

Research On The Generation And Defense Of Natural Adversarial Example In Text Field

Posted on:2023-03-01Degree:MasterType:Thesis
Country:ChinaCandidate:X G YangFull Text:PDF
GTID:2558306914456364Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of adversarial example generation technology in the image field,researchers began to transfer their focus to the text field.Because natural language processing is widely used in many practical scenes such as online translation,emotion classification,subtitle generation and speech recognition,and adversarial example is easy to spread,it is of great practical significance to study adversarial example in the text field.At present,the generation technology of adversarial example in the text field is mainly synonym replacement,which has the problems of insufficient semantics and lack of pertinence;The defense method is mainly adversarial training,which has the problem of reducing the original classification accuracy of the model.The research content of this thesis is the generation and defense of natural adversarial examples in the text domain.Natural adversarial samples refer to adversarial examples with fluent semantics and correct grammar.Focusing on the three purposes of improving the semantics of adversarial example,attacking effectiveness and enhancing model security,this thesis studies the generation technology and defense methods of natural adversarial example in the text field,and proposes two new adversarial example generation algorithms and an adversarial training method.The main contributions of this thesis are as follows.1.A semantic preserving adversarial example generation algorithm.The existing text domain word replacement adversarial example generation algorithms choose synonymous substitutes based on the degree to which the replacement of the word can reduce the accuracy of the classifier,without considering the impact of different synonymous substitutes on the semantics of the adversarial example,which makes the adversarial example easy to be detected.This thesis proposes a synonym selection index integrating semantics and aggressiveness,and designs a semantic preserving adversarial example generation algorithm,which is called SPA in this thesis.Experimental results show that compared with Textfooler,BAE attack algorithms,this algorithm can generate high semantic adversarial example on the premise of maintaining a high attack success rate.Take attacking the LSTM model trained on IMDB dataset as an example,the accuracy of the model is reduced from 89%to 5.5%,and the semantics of the adversarial example generated by SPA is improved compared with the baseline methods under multiple indicators.2.A targeted text attack algorithm based on improved Deepfool.The existing text domain adversarial example generation algorithm attack model did not achieve a more targeted attack.In this thesis,by modifying the calculation of the non target distance of the attack algorithm Deepfool in the image field to the calculation of the distance to the target category,we get the algorithm TargetDeepfool which can realize the target attack.In this thesis,TargetDeepfool is used to select the best synonymous replacement in text adversarial attack,and a target attack algorithm in text field is designed,which is called TextTargetFool.The experimental results show that TextTargetFool can generate adversarial example with good semantics on the premise of high success rate of target category disturbance.Taking the textCNN model trained on IMDB dataset and the target category is 0 as an example,TextTargetFool can achieve 99.8%success rate of target category disturbance.3.A model training method of defensing word replacement examples that can maintain the accuracy of classifier.The existing text domain defense methods can defend part of the adversarial example when training the model,they will reduce the accuracy of the original classification model.This thesis designs a model training method of defensing word replacement examples,which is called DWR in this thesis.This method first increases the interference examples by randomly disturbing the input,then inputs the original examples and interference examples into the model,and finally optimizes the model parameters combined with the three part loss function.The experimental results show that compared with the baseline method,DWR method can improve the ability of the model to defend the adversarial example without reducing the accuracy of the original classifier.Taking the LSTM model trained on MR dataset as an example,the accuracy of the model trained based on DWR method is improved by 0.8%and the defense accuracy is improved by 5.1%compared with the original model.
Keywords/Search Tags:adversarial example, semantic, synonymous substitution, targeted attack, defense method
PDF Full Text Request
Related items