Sentiment Analysis(SA)is an important branch in the field of Natural Language Processing(NLP).With the development of the times,the Internet has become the primary platform for social networking.It contains a large amount of data information,many of which are people’s comments and attitudes on current events.It is of great theoretical and practical significance for us to study these texts expressing attitudes and emotions.Sentiment analysis is widely used in public opinion analysis,commodity recommendation,social livelihood and other fields.At present,sentiment analysis methods mainly include sentiment analysis based on sentiment dictionary,sentiment analysis based on traditional machine learning and sentiment analysis based on deep learning.Deep learning network has unique memory function,which makes it become the mainstream emotion analysis method at present.In recent years,pre-trained language models based on deep learning have been developed,and the effect of emotion analysis has been effectively improved.This paper is a research on sentiment analysis of Chinese and English texts based on pre-trained language model.The main research contents and achievements are as follows:(1)Sequence is an important feature of text statements.Aiming at the problem that existing methods are not accurate in obtaining sequence of text statements,this paper makes an in depth study on word order embedding method and proposes an improved Masked selfattention method based on Attention mechanism.A text sentiment analysis model based on MSABERT is constructed.First,the mask self-attention method learns the position information of the text statement sequence.Secondly,feature information of emotion category is extracted by using BERT model.Finally,a feed forward neural network is used to map the text vector into emotion categories.Experiments on English text dichotomous data set show that this method not only learns word location information,but also achieves good classification effect.(2)Although some emotional words can express positive or negative emotions,the degree of their emotional expression is not the same,at the same time,there will be differences between the emotional expression of some words.To solve this problem,this paper proposes a text sentiment analysis model(MSABERT_CLHAN)based on contrast learning pre-training language model.MSABERT model is used to extract text features.It uses contrastive learning to conduct pre-training in corpora that express similar emotions but have different emotional polarities.The proposed MSABERT_CLHAN model is tested on the five-classification data set of English texts,and the experimental results prove the validity of the model.(3)Aiming at the uncertainty of Chinese text data segmentation and the instability of the model,this paper constructed a text sentiment analysis model based on Wo BERT_CL.The Wo BERT model based on Chinese words firstly uses the model to identify Chinese words and effectively reduces the uncertainty of meaning.Secondly,r-DROP,a comparative learning method,was used in data processing of Wo BERT pre-training model,which effectively solved the problem of large similarity between different outputs obtained from the same input during model training.Finally,the Wo BERT_CL model was used for sentiment analysis of Chinese text data sets.The experimental results show that the Wo BERT_CL model proposed in this paper can effectively improve the robustness of the text sentiment analysis model compared with the traditional neural network model,and is more accurate for Chinese text recognition with large similarity. |