Font Size: a A A

Research Of Joint Topic Sentiment Analysis Based On Word Embeddings Probability Model

Posted on:2018-01-06Degree:MasterType:Thesis
Country:ChinaCandidate:H Y WuFull Text:PDF
GTID:2348330536456296Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
It provides users with a lot of convenience with the fast development and popularization of the internet.Users can browse and comment the network information conveniently,such as online goods,reviews,news and service.Users' comments information reflects their emotional attitude to a comment topic,thus it is very valuable to effectively analysis and dig up users' comments information.However,most of existing topic sentiment jointly analysis methods for online reviews information are mainly based on the attribute of the data itself,which regards one word as a basic processing unit and a single semantic entity in the mean time.The corresponding probability calculations are based on the word frequency statistics,ignoring the dependent relationship between words in the text each other,and adopting fixed emotional dictionary.This kind of method brings about inadequate semantic information and can not effectively express the complex semantic relations.It is difficult to meet the actual application requirements.Aiming at these issues,this paper takes the user's online comment text data as the research object and combines with probability topic model,word embeddings and neural network,proposing a new topic sentiment jointly analysis model,which can idtenfy topic and analysis sentiment simultaneously.The main work of this paper includes the following two parts:(1)We propose a novel topic sentiment joint model called weakly supervised topic sentiment joint model with word embeddings(WS-TSWE).Most of existing topic sentiment jointly analysis models are mainly based on word frequency statistics to process relevant probability calculations,when the training corpus is small or when the documents are short,the textual features become sparse and high-dimensional,because of the lack of semantic information of the approach of simply relying on word frequency statistics for topic and sentiment distribution,the results of the sentiment and topic distributions might be not very satisfied.And relying on the special domain of sentiment knowledge of priori information to identify the positive and negative words in the corpus,which limits the impact of the sentiment priori and ultimately causes the sentiment distribution of the sentiment aspectsinaccurate.In this paper,basing on the latent feature model LDA(latent Dirichlet allocation),we add an sentiment layer and the external word embeddings representation of the words from the external extension corpus,and use the Bernoulli distribution to combine with the sentiment topic-word distribution generated by the word frequency statistics and Softmax function,which calculate the joint probability distribution function and then train to get the sentiment topic-word generation.Besides,the model further calculates the sentiment tendencies of each word in the corpus by the HowNet lexicon combining with contex.First,we calculate the sentiment value of the word by the HowNet lexicon,and then update the sentiment value combining the context information and model training,which improves the accuracy of the topic sentiment joint model.(2)We propose the topic sentiment joint model with word embeddings dependence(we called it RTSWE).Although the weakly supervised topic sentiment joint model with word embeddings(WS-TSWE)extends the semantic information of the word,it does not consider the dependent relationship among words in the process of generating words.So that the semantic information between the words is relatively independent.The paper use the results of sentiment-topic embedding obtained by WS-TSWE,then calculates the dependency of the long distance of words by the neural network LSTM(Long short term)and GRU(Gated Recurrent Unit).Finally,we calculate the probability of generating words under the given sentiment topic and the sequential order dependency of words,which makes full use of the characteristics of LSTM and GRU.It integrates the sequential order information between words and enriches the semantic information of the words.The paper redefines the sentiment topic-words distribution,making the results of the topic recognition and sentiment analysis more accurate.
Keywords/Search Tags:Probability Topic Model, Word Embeddings, Neural Network, Topic Recognition, Sentiment Analys
PDF Full Text Request
Related items