Font Size: a A A

Research On Text Representation And Text Classification Method Based On Adversarial Training

Posted on:2021-03-07Degree:MasterType:Thesis
Country:ChinaCandidate:X H ZhangFull Text:PDF
GTID:2428330614471774Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Text representation and text classification are widely used in webpage interception,mail information filtering and information retrieval,and are basic tasks of natural language processing.Although the current deep neural network-based text representation and text classification model has achieved good results,it has proved to be prone to overfitting problems in practice.In recent years,the adversarial training method has been applied to the field of natural language processing.By adding adversarial disturbances to the deep neural network to reduce the model's sensitivity to adversarial disturbances,it can effectively alleviate model overfitting.Therefore,this paper studies the text representation and text classification method based on adversarial training,the specific work is as follows:(1)A text representation and text classification model LM-LSTM-Adv T based on language model adversarial training is designed.The LM-LSTM-Adv T model first trains the text representation model of the LSTM-based recurrent neural network language model to obtain the text representation and network weights,and then trains the LSTMbased text classification model for text classification.At the same time,in order to alleviate the overfitting of the model LM-LSTM-Adv T,both the text representation model and the text classification model adopt the FCM-based adversarial training method.That is to add the anti-disturbance of the gradient of the loss function relative to the word vector in the word vector layer.The experimental results show that compared with the LM-LSTM model that does not use the adversarial training method,the accuracy rates of the data sets AGNews,Subj,MPQA,CR,and MR were improved by 0.14%,2.17%,3.84%,8.48%,and 6.13%,respectively.In addition,the LM-LSTM-Adv T model improves the accuracy rates of 0.87%,1.01%,0.12% and 0.16% on data sets Subj,MPQA,CR,and MR which compared with the LM-Adv T model that uses the adversarial training method only in the text representation model of the LSTM-based recurrent neural network language model.(2)A text representation and text classification model SA-Adv T-LSTM-Adv T based on the sequence autoencoder adversarial training is designed.The SA-Adv T-LSTMAdv T model first trains the LSTM-based sequence autoencoder text representation model for feature extraction to obtain the text representation and network weights,and then trains the LSTM-based text classification model for text classification.In order to alleviate the overfitting of the model,the text representation model and the text classification model are also trained using the FCM-based adversarial training method.In order to fully study the model effect of SA-Adv T-LSTM-Adv T,this paper designs another 5 comparison models.The experimental results show that compared with the SALSTM model that does not use the adversarial training method,the accuracy of the SAAdv T-LSTM-Adv T model were improved by 0.78%?3.05%?2.1%?1.11%? 5.27% on the data sets AGNews,Subj,MPQA,CR,and MR.Compared with the LM-LSTMAdv T model,the SA-Adv T-LSTM-Adv T model has improved the accuracy rates of 0.82%,0.14%,0.45%,0.89%,and 0.58% on the data sets AGNews,Subj,MPQA,CR,and MR.In addition,this paper also compares the LM-LSTM-Adv T model and SA-Adv TLSTM-Adv T model with existing t text representation and text classification models such as VVD,CNN and Fast Text.Experimental results show that the accuracy of the SAAdv T-LSTM-Adv T model in this paper is better than other models on the data sets AGNews,Subj,MPQA,CR and MR.
Keywords/Search Tags:Text representation, Text classification, Adversarial training, Language model, Sequence autoencoder
PDF Full Text Request
Related items