Font Size: a A A

Research On Sentiment Analysis Of Microblog Short Texts Based On Deep Learning

Posted on:2021-02-09Degree:MasterType:Thesis
Country:ChinaCandidate:X QinFull Text:PDF
GTID:2428330611489045Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the popularization of network technology,the diversification of network culture and the increasing number of Internet users in China,social media tools have become an indispensable communication medium in people's daily life,and the emotional analysis of social media texts has gradually become an active research direction in the field of natural language processing.Micro-blog is becoming an important platform for the public to express their opinions and emotions due to its features of simplicity,ease of use and rapid spread.Therefore,a large amount of text data with emotion have been generated.The sentiment analysis of these emotional microblog texts will provide effective help for the decision-making of the government,enterprises and individuals.However,in the current Internet environment,people express their opinions and emotions with various kinds of words.In order to improve the level of dealing with the task of emotion analysis,the new words discovery method,the vector representation of the short texts and the sentiment classification model have been studied in this paper considering the short texts' features such as diversity of the new words and sparsity of the text features.The main research contents include:(1)Aiming at the problem that the N-Gram-based neologisms discovery method produces a lot of junk word strings,this paper studies the neologisms discovery method on microblog short texts by combining statistics such as mutual information,left and right adjacency entropy,with the stop words dictionary and frequent words dictionary.Mutual information and adjacency entropy are used respectively to measure the internal solidification degree and boundary freedom of words in the binary and triad generatedby N-Gram diction words.After the candidate word set is obtained,the final neologism set can be generated by filtering with the stop words dictionary and the frequent words dictionary.The experimental results show that the proposed new word discovery method is effective in finding new words on the NLPCC2014 microblog corpus dataset.(2)Aiming at the problem of sparse feature and lack of semantics of microblog short texts,a vector representation of microblog short texts is proposed based on BERT.The BERT model is used to accomplish word embedding after the preprocessing of the short texts,which captures the polysemy while transforming the texts into vectors,and thus obtains more accurate text representation vectors.to embed words in the pre-processed short texts,and captured polysemy while transforming text into vectors,so as to produce more accurate text representation vectors.The experimental results show that,compared with the vector representation generated by the CBOW model based on Word2 Vec,the word vector generated by BERT can achieve better on sentiment classification effect.(3)In order to solve the problem that the current deep learning-based sentiment analysis method of microblog short texts fails to highlight the importance of emotion words or phrases when extracting the emotional features of texts,a BiGRU-Att model based on the BiGRU deep neural network and the attention mechanism is proposed.Experimental results show that compared with CNN,BiLSTM and BiGRU,the proposed model can effectively improve the accuracy of sentiment classification.
Keywords/Search Tags:Microblog Sentiment analysis, Deep learning, New words discovery, BERT
PDF Full Text Request
Related items