Font Size: a A A

Adversarial Examples Generation For Text Classification Neural Network

Posted on:2022-10-26Degree:MasterType:Thesis
Country:ChinaCandidate:C LiuFull Text:PDF
GTID:2518306479493304Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Deep neural networks have achieved remarkable achievements in the field of natural language processing.As a branch of natural language processing model,text classification model plays an important role in sentiment analysis,spam detection and news classification tasks.Those tasks have been very challenging until two neural network models are introduced.Recurrent neural networks have context information which will contribute to the decision of classifications;text graph convolution networks abstract the relation between words and sentences into topological graph,which have high accuracy of text classification.However,recent research reveals that adversarial attack can make the text classification model invalid,and it has high attack success rate.Thus,adversarial attack has brought in some crucial threats to the text classification models,which limits capability of the models.To defend adversarial attack,robustness of model is critical.When applied to adversarial training,adversarial examples will improve the robustness of models.So our intention is to generate specific adversarial examples to help text classification models against adversarial attacks.To generate adversarial examples,the adversarial perturbation is added to the input data.However,even small perturbations are still easily detected by human beings,and the semantics of the original text may change.Therefore,the valid adversarial examples are the examples that fool the classifier successfully and remain the original semantic.For the recurrent neural networks,a large number of adversarial example generation algorithms have been proposed,but many algorithms have low attack success rate,and the generated adversarial examples preserve low utility.For the text graph convolutional network,the research of adversarial attack is still in its infancy,it is necessary to propose new algorithm for generating adversarial examples.This paper conducts research of generating adversarial examples for the above two text classification models.Three adversarial attack algorithms are presented.The main work of this paper is summarized as follows:1.PGD is the best first-order adversarial attack algorithm in image field.It not only has high attack success rate,but also ensures that the adversarial attack examples are similar to the original.In this paper,we transfer the PGD algorithm to the text domain to generate the adversarial examples for the recurrent neural network.However,PGD algorithm will perturb all the words in the text,it may change the semantic and syntactic of the original texts.In order to ensure the generated perturbation is imperceptible,and the adversarial examples preserve semantic similarity from original text,we propose Extend-PGD algorithm,which only modifies a few words in the original text.Firstly,the algorithm computes the Jacobian matrix of the classifier to find out the words that have great impact on the classification results.By sorting the words in reverse order according to the influence value,we use PGD algorithm to generate perturbations for the words in turn.If the perturbed word preserves the semantics of the original,the original word is replaced by perturbed one.In each iteration only one word is perturbed,until the goal of the adversarial attack is achieved or the maximum modification is reached.In this paper,we conduct comparison experiments show that adversarial examples generated by Extend-PGD have advantages of attack success rate and semantics-preserving ability.2.C&W is an optimization-based adversarial attack algorithm in the field of image.C&W algorithm has high attack success rate,and the perturbation generated by it is quasi-imperceptible.In this paper,C&W algorithm is transferred to the text domain to perturb the recurrent neural network.However,the C&W algorithm creates dense perturbation for a given input.This dense perturbation can easily change the semantics of sentences.In this paper,we introduce (?)1 regularization in our ExtendC&W attack algorithm to improve the sparseness of the perturbation,and minimize the modifications on the original input text.Extend-C&W generates adversarial perturbations by solving the optimization problem,in each iteration,we calculate top-3 nearest neighbors of original word in vector space as candidate set,if the perturbed word is not in the candidate set,it is replaced by the word in the set closest with it in vector space.The adversarial examples generated by Extend-C&W retain the semantics with the original sentence,and the adversarial examples have good utility.3.FGA is a gradient-based adversarial attack algorithm in the field of graph data.It can effectively generate adversarial perturbation for ordinary graph,and has high attack success rate.Text graph convolution network is trained based on text graph,which is weighted graph.Therefore,this paper extends FGA algorithm to text graph,proposes Graph-attack algorithm to generate adversarial examples for text graph convolution neural network.Graph-attack algorithm perturbs the features of nodes and edges in text graph at the same time.It uses gradient to modify the features of nodes and inspired by FGA algorithm,modifies the weight of edges.According to different perturbation methods,the corresponding perturbation limiting strategy is adopted to ensure the effectiveness of the generated adversarial examples.As shown in the result of experiments,Graph-attack algorithm can reduce the accuracy of the model effectively.
Keywords/Search Tags:Adversarial Attack, Text Classification, Recurrent Neural Network, Text Graph Convolutional Network
PDF Full Text Request
Related items