Font Size: a A A

Generating Adversarial Text With Policy Gradient Against Deep Learning Classifiers

Posted on:2021-10-31Degree:MasterType:Thesis
Country:ChinaCandidate:W ZhouFull Text:PDF
GTID:2518306122468604Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The rapid development of artificial intelligence technology in recent years can easily extract the features and internal connections in natural data,and is very good at fitting highly nonlinear relationships.Therefore,artificial intelligence technology is widely used in image detection,speech recognition,and natural language processing tasks,and shows excellent performance.With the promotion and use of these practical applications,the security of artificial intelligence technology has attracted more and more attention.In particular,the discovery of anti-samples makes the application of these artificial intelligence technologies face a huge threat.By studying the generation method of confrontation samples,one can explore the essence of the existence of confrontation samples and think more deeply about artificial intelligence technology.Studying the generation and defense of confrontation samples and carrying out the offensive and defensive warfare of artificial intelligence technology can continuously improve artificial intelligence technology and further guarantee the security of artificial intelligence technology in practical applications.The adversarial samples were first discovered in the image classifier,and adding image-insensitive noise to the image can make the image classifier misclassify.Due to the richness and continuity of image features,the generation methods of confrontation images are very diverse.But because text data has discrete characteristics,the method of generating adversarial images is not suitable for combating text generation.The main research content of this paper is the method of generating adversarial text,and a method of generating adversarial text based on strategy gradient is proposed.In actual situations,it is difficult to obtain the model structure and parameter information of the target text classifier,so the method proposed in this paper is to attack the target model under black box conditions.Aiming at the feature of discrete text data,continuous disturbance cannot be directly added,and a text encoder is used to generate text against it.The text encoder maps the discrete text data into a continuous hidden space,and generates text based on the hidden space feature vector.In the absence of a standard adversarial text data set,in order for the text encoder to generate adversarial text that retains original semantics and enables the classifier to classify errors,the strategy gradient algorithm in reinforcement learning is used to adjust the parameters in the text encoder.The adjustment method of the strategy gradient is to maximize the reward of the original text and the generated text sample pair.The parameters of the text generator can be adjusted by only including the calculation of the difference and similarity of the text pair classification results in the calculation method of the reward,and the confrontation text with a significant attack effect can be generated.Compared with the existing artificially added word-level or character-level disturbances in the text,the method proposed in this paper can generate more natural confrontation text in batches.Finally,based on the deep learning framework Tensorflow,this paper implements a strategy gradient-based adversarial text generation method,and conducts attack experiments on 7 real natural language processing data sets to analyze its performance.Analyzing the experimental results,the adversarial text generated by the method proposed in this paper can reduce the text classifier with an accuracy of 95.9% by53.48%.The accuracy of the generated adversarial text on different data sets is between 29.89% and 53.28%.The similarity score between the confrontation text and the original text is concentrated between 0.8 and 0.9,and the generated confrontation text has an attack effect on different text classification models,and has the portability.
Keywords/Search Tags:Adversarial text, Policy Gradient, Auto-encoder, GAN
PDF Full Text Request
Related items