Font Size: a A A

A Topic-enhanced Recurrent Autoencoder Model For Sentiment Analysis Of Short Texts

Posted on:2019-01-07Degree:MasterType:Thesis
Country:ChinaCandidate:Y J PangFull Text:PDF
GTID:2428330563491729Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology,more and more people would like to express their opinions or emotions on the network.Therefore,it has become a hot focus in the field of text mining to analyze the sentiments of short texts.Text vectorization is the foundation of the machine-learning approach for sentiment analysis.However,traditional methods of text vectorization don't consider the importance of words' sentiments depend on context,so the Recurrent Autoencoder(RAE)model is proposed in this paper to generate the vector representation of short text by words' natural order.Besides,topic information is an effective indicator for sentiment classification.A word may have different sentiments under different topics.This motivates us that topic information can be encoded in the word representation for sentiment analysis task,so the Joint Sentiment-Topic(JST)model is used to extract the implicit topic and sentiment information from the short text.The main works are as follows:(1)Text vectorization plays an important part in the process of sentiment analysis based on machine learning.Combining with Recurrent Neural Networks and Autoencoder,the concept of the RAE model is proposed in this paper.The RAE model utilizes the recurrent neural networks to integrate all word embeddings of the short text in natural order.And each step of combining uses the autoencoder to minimize errors,which makes the text vector generated by the RAE model remain the original text information as much as possible.Experimental results show that compared with other models,the RAE model has a higher efficiency in training and the text vector generated by it has a higher accuracy in sentiment classification.The average accuracy is about 91.2%.(2)Topic has an important effect on sentiment analysis of short texts,so the Joint Sentiment-Topic Recurrent Autoencoder Model(JST-RAE)is put forward to combine the text vector with topic information.In the JST-RAE model,we first use the JST model to calculate the topic-sentiment combined distribution,then utilize the RAE model to build the text vector under the supervision of this probability distribution.Finally,the text vector can show the original topic and sentiment information of the short text.The results of the contrastive experiments show that the text vector generated by the JST-RAE model has a better performance in sentiment analysis of short texts.(3)Short texts always have the characteristic of semantic diversity and it is difficult for those traditional models to identify the negations and ironies in short texts,so sentiment lexicon is utilized to add feature dimensions for those text vectors,then different classifiers are used for sentiment classification in this paper.The experimental results show that the text vector combined with the sentiment lexicon has a good efficiency in the sentiment analysis of short texts and it can solve the problem of semantic diversity.Besides,it is also proved that different classifiers have different accuracies of sentiment classification for short texts.
Keywords/Search Tags:Shorts texts, Topic and sentiment analysis, Recurrent autoencoder, JST-RAE model, Sentiment lexicons
PDF Full Text Request
Related items