Research On Text Representation And Text Classification Method Based On Adversarial Training

Posted on:2021-03-07

Degree:Master

Type:Thesis

Country:China

Candidate:X H Zhang

Full Text:PDF

GTID:2428330614471774

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Text representation and text classification are widely used in webpage interception,mail information filtering and information retrieval,and are basic tasks of natural language processing.Although the current deep neural network-based text representation and text classification model has achieved good results,it has proved to be prone to overfitting problems in practice.In recent years,the adversarial training method has been applied to the field of natural language processing.By adding adversarial disturbances to the deep neural network to reduce the model's sensitivity to adversarial disturbances,it can effectively alleviate model overfitting.Therefore,this paper studies the text representation and text classification method based on adversarial training,the specific work is as follows:(1)A text representation and text classification model LM-LSTM-Adv T based on language model adversarial training is designed.The LM-LSTM-Adv T model first trains the text representation model of the LSTM-based recurrent neural network language model to obtain the text representation and network weights,and then trains the LSTMbased text classification model for text classification.At the same time,in order to alleviate the overfitting of the model LM-LSTM-Adv T,both the text representation model and the text classification model adopt the FCM-based adversarial training method.That is to add the anti-disturbance of the gradient of the loss function relative to the word vector in the word vector layer.The experimental results show that compared with the LM-LSTM model that does not use the adversarial training method,the accuracy rates of the data sets AGNews,Subj,MPQA,CR,and MR were improved by 0.14%,2.17%,3.84%,8.48%,and 6.13%,respectively.In addition,the LM-LSTM-Adv T model improves the accuracy rates of 0.87%,1.01%,0.12% and 0.16% on data sets Subj,MPQA,CR,and MR which compared with the LM-Adv T model that uses the adversarial training method only in the text representation model of the LSTM-based recurrent neural network language model.(2)A text representation and text classification model SA-Adv T-LSTM-Adv T based on the sequence autoencoder adversarial training is designed.The SA-Adv T-LSTMAdv T model first trains the LSTM-based sequence autoencoder text representation model for feature extraction to obtain the text representation and network weights,and then trains the LSTM-based text classification model for text classification.In order to alleviate the overfitting of the model,the text representation model and the text classification model are also trained using the FCM-based adversarial training method.In order to fully study the model effect of SA-Adv T-LSTM-Adv T,this paper designs another 5 comparison models.The experimental results show that compared with the SALSTM model that does not use the adversarial training method,the accuracy of the SAAdv T-LSTM-Adv T model were improved by 0.78%?3.05%?2.1%?1.11%? 5.27% on the data sets AGNews,Subj,MPQA,CR,and MR.Compared with the LM-LSTMAdv T model,the SA-Adv T-LSTM-Adv T model has improved the accuracy rates of 0.82%,0.14%,0.45%,0.89%,and 0.58% on the data sets AGNews,Subj,MPQA,CR,and MR.In addition,this paper also compares the LM-LSTM-Adv T model and SA-Adv TLSTM-Adv T model with existing t text representation and text classification models such as VVD,CNN and Fast Text.Experimental results show that the accuracy of the SAAdv T-LSTM-Adv T model in this paper is better than other models on the data sets AGNews,Subj,MPQA,CR and MR.

Keywords/Search Tags:

Text representation, Text classification, Adversarial training, Language model, Sequence autoencoder

PDF Full Text Request

Related items

1	Research On Text Representation Model And Application In Text Classification And Natural Language Inference
2	The Research On Local Smooth Preserving Of Manifold Regularization Auto Encoder For Text Representation
3	Research On Adversarial Sample Generation And Defease Methods For Text Classification
4	Research On The Scoring Method Of Open-ended Question Answer Based On Adversarial Text
5	Researching Text Classification Using Semantic And Sequence Information
6	Text Classification Based On Deep Transfer Learning
7	Research On Text Sentiment Analysis Based On Adversarial Training
8	Research On Improved Text Representation Model Based On BERT
9	Research On Multi-label Text Classification Based On Hybrid Neural Network
10	Short Text-based Adversarial Example Attack