| Natural language processing is an important step for artificial intelligence to move from perception to cognition,and it has important application value in both military and civilian fields.Deep learning technology plays an increasingly important role in the field of natural language processing due to its powerful self-learning ability and data processing ability.However,despite deep learning-based natural language processing models show excellent performance,there are vulnerable to maliciously crafted adversarial examples.This brings a serious test to the actual deployment of the realistic scene of the natural language processing model.In order to further study the vulnerability and safety blind spots of natural language processing models based on deep neural network.The researchers conducted in-depth research and analysis on the reasons and construction methods of adversarial examples from the theoretical and technical levels,and proposed a variety of adversarial attack methods to evaluate and enhance natural language processing models.However,the existing research on text adversarial attacks mainly has the following two problems: First,the attack cost of the existing attack methods is too high,especially in the black box condition,the attacker cannot access the information such as the internal structure and parameters of the target model,the generation of adversarial samples can only be guided by manipulating the input and output of the target model,which requires a large number of queries on the target model.Therefore,to find the required number of adversarial examples is often costly.The second is that the adversarial disturbances generated by the existing attack method are usually aimed at specific input samples,and can only find an adversarial disturbance by repeated iteration of a single sample,resulting in low efficiency.In order to solve the above problems,this dissertation studies the key technologies of text adversarial attacks from three aspects:First,the problem of excessive number of query the target models in the existing attack method has proposed an adversarial attack method under a limited number of queries.By using the information of the adversarial samples generated by the local model,we transfer part of the process of attacking the target model to the local model to complete in advance.On the premise of ensuring a high attack success rate,the query cost in the attack process is greatly reduced.Compared with existing black-box attacks,this method reduces the average query cost by more than 46%.Second,since most of the existing attack methods use a single input sample to find adversarial disturbances one by one,we design a general adversarial attack method based on discrete particle swarm optimization.By finding a universal trigger and adding it to any input sample,we can cause the classifier to make predictions wrong.This method is able to fool multiple classifiers with high success rate.Third,most of the existing universal adversarial attack methods have the problems of low quality and easy identification of the generated universal triggers.We propose a universal adversarial attack method based on BERT sampling.The method can effectively generate universal triggers,and the combination of word frequency,fluency,grammar and human evaluation has proved the naturalness of the method to generate universal triggers. |