Font Size: a A A

Sentiment Analysis Of Weibo Based On Deep Learning

Posted on:2021-02-13Degree:MasterType:Thesis
Country:ChinaCandidate:F YangFull Text:PDF
GTID:2428330611489518Subject:Mathematics
Abstract/Summary:PDF Full Text Request
With the continuous development of Web2.0 technology,mining valuable information from massive text data has become a hot issue in all professions and trades.Traditional data mining mainly extracts meaningful information from the structured data.The popularization of network leads to the increase of unstructured data.Extracting valuable information from unstructured data has become the focus of natural language processing.With the popularity of mobile devices,Weibo,as an important social tool for mobile devices,generates text data every day.It is a focus issue to study the intention and purpose of microblog text providers from these microblog data.This thesisn focuses on the model research of Chinese Weibo sentiment classification,and applies the research results to the two data datasetss of "garbage classification" in 2019 Weibo and "follow on Novel coronavirus pneumonia" in January 2020.The experimental results show the proposed the validity of the model.In recent years,scholars at home and abroad have only studied the positive and negative aspects of Weibo sentiment research.In view of this deficiency,this thesisn proposes a model suitable for micro-blog sentiment classification based on jieba segmentation,Word2 vec model,long short-term memory network,gate recurrent unit network and Self-attention model.First,use the jieba segmentation added to the custom dictionary library of new words and emoticons to segment the cleaned text data.After segmentation,the Word2 vec model is also trained with word vectors suitable for microblog text features to convert Chinese words into finite-dimensional real vector.Secondly,input word vectors into long short-term memory network and gate recurrent unit network to learn the sentiment features of Weibo text datasets.Finally,connect each state of long short-term memory network and gate recurrent unit network with Self-attention mechanism and the output layer.The experimental results on the two data datasetss show that: 1.Compared with the word vector model trained on the ordinary text datasets,the Word2 vec model trained on the language characteristics of Weibo is used to describe the words in the Weibo text more accurately.2.Compared with the Long Short-term Memory network model,the gate recurrent unit network model is easier to converge,the training time is shorter,and it is more suitable for microblog text sentiment classification.3.By setting the different directions of the propagation layer of the recirculation gate unit,the purpose of retaining and controlling the context characteristics of the text can be achieved 4.The Self-attention mechanism can effectively learn the text information of different parts of the sentence,and at the same time can notice the relationship between each word and itself.5.Compared with the single long short-term memory network model and gate recurrent unit network model,the mixed neural network model is more effective in the sentiment classification task of Chinese Weibo.In the framework of microblog text sentiment classification proposed in this thesisn,Self-attention can make up for the gradient problem of long short-term memory networks and gate recurrent unit networks,and improve the classification performance of the model.The hybrid model of deep learning for sentiment classification proposed in this thesisn can be used for reference in other similar short text Chinese text classification.
Keywords/Search Tags:sentiment analysis, Word2vec, long short-term memory network, gate recurrent unit network, Self-attention mechanism
PDF Full Text Request
Related items