Font Size: a A A

Research On Sentiment Classification Of Multilingual Network Comments Based On XLM-R

Posted on:2021-12-20Degree:MasterType:Thesis
Country:ChinaCandidate:Z ShiFull Text:PDF
GTID:2518306563464614Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The continuous development of mobile Internet technology and the popularization of network application software have made people accustomed to expressing their opinions and viewpoints on diverse emerging network platforms such as social media and technical blogs,which promotes social progress.However,there are also many problems,such as online violence and gender discrimination.All these problems have caused non-negligible impacts on people's lives.Therefore,it is quite meaningful to identify malicious comments on the Internet.But the continuous acceleration of globalization and the increasing number of Internet users have made the languages on the Internet diverse.Some software even includes multiple languages.If comment recognition is performed on every language on the Internet with a corresponding language model,it is obvious that there are certain difficulties in the collection of corpora,as well as the creation and application of the models.Therefore,it is essential to use cross-lingual language model to perform malicious and non-malicious identification on multilingual network comments.In this paper,we studied the classification of multilingual online comments,including the improvement in the method of extracting text features from cross-lingual language model and the evaluation of the classification effect from different text classification algorithms,as follows:1.In this paper,four representative cross-lingual language model in recent years are expounded theoretically,and an experimental contrast is performed for the classification effects of these four types of models,which include m BERT,XLM,Unicoder and XLM-R.Experimental results reveal that XLM-R model has the highest accuracy and AUC value.Therefore,XLM-R is taken as the principal object in the subsequent study.2.In order to obtain more text features from XLM-R model,this paper proposes to rebuild its network structure,that is,combine the 10th,11th and 12th layers(the last three layers)of XLM-R to create XLM-R-3 model.Experimental results reveal that XLM-R-3 model has higher accuracy and AUC value than the initial model and other combination models.After that,XLM-R-3 model,as the word embedding layer,is combined with different conventional classification algorithms.Experimental results reveal that XLM-R-3 model has the highest accuracy and AUC value as it combines with support vector machine.3.This paper uses deep learning classification algorithm to further extract features.In this paper,the XLM-R-3 model is used as the input layer,and convolutional neural network,recurrent neural network and their variants are combined.Finally,this paper puts forward XLM-R-3-BGA model,which extracts the context features of the text by using a combination of XLM-R-3 model and Bi GRU network and taking Bi GRU network as the principal network,at last,using Attention to highlight key words so as to carry out a review and analysis.Experimental results reveal that both the precision and AUC value of XLM-R-3-BGA model proposed in this paper are higher than those of the initial model and other combination models.4.Comparison of classification effects between cross-lingual language model and monolingual.This paper used google translate to translate English into Spanish and Italian.Then cross-lingual language model was applied to compare with BERT of Spanish and Italian,respectively.The experimental results showed that the accuracy and the AUC value of cross-lingual language model were higher than those of BERT.
Keywords/Search Tags:Sentiment classification, Cross-lingual language model, XLM-R, Classification algorithm
PDF Full Text Request
Related items