Font Size: a A A

Study And Implementation On Fine-Graind Sentiment Analysis Techniques For Microblog

Posted on:2015-12-07Degree:MasterType:Thesis
Country:ChinaCandidate:L P FuFull Text:PDF
GTID:2348330482455606Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the popularity of social media platform, more and more people like to publish their comments and express their opinions about personages, events, and products on Internet. In this background, microblogging services, as an emering social media, has become increasingly popular and attracts a lot of users all around the world. On the platform, the user-generated microblog posts reflect personal opinions and feelings, which contain much valuable information and rich sentiment. So sentiment analysis in micoblogs plays an important role in Web opinion analisys. However, most of general sentiment classification methods simply consider two coarse-grained sentiment categories, such as positive and negative, or three sentiment categories, such as positive, neutral, and negative, so fail to capture users' fine-grained sentiments. Based on above facts, this thesis focuses on leveraging rich emoticons in Weibo and analyzing users' fined-grained sentiments.Firstly, the short text similarity for microblog fined-grained sentiment analysis is studied. Because of the limited length of microblogs, when calculating similarity, there are few co-occuring key words between the two vectors, which are calculated by TF-IDF and very sparse. Therefore, the similarity calculation result for short text generated by tradition vector space model is not ideal. For reducing affect of sparsity, topic-word probability distribution is learned in this thesis by using Latent Dirichlet Allocation model. Then by calculating the topic relevance of any two microblogs'different feature words, corresponding word vector is updated continually. After that, microblogs' content similarity is calculated by cosine similarity method. This way of similarity calculation is proved to effective by experiments. Moreover, based on emoticons, microblogs'sentiment is divided into five fine-grained categories:"happy", "love", "sad", "anxiety" and "angry". After data preprocessing and emotion tagging steps, the microblogs with typical emoticons are saved for the experiment.Secondly, fine-grained sentiment analysis base on traditional Naive Bayesian classifier is studied. Trainning on real world Weibo dataset, each microblog' sentiment probabilities in each category is calculated easily, and saved in a sentiment sequence. Experiment shows that Naive Bayesian method has achieved good performence in fined-grained sentiment analysis problem.Thirdly, fine-grained sentiment analysis base on K nearest neighbors is studied. Base on the K most relevant microblogs got through microblogs' similarity, each microblog' fine-grained sentiment sequence is predicted. Results on real world Sina Weibo dataset show that the traditional KNN method performs well in fine-grained sentiment classification.Finally, based on the characteristics of bayesian and KNN, an approach combining KNN with Bayesian combined method is proposed. By microblogs' similarity, probability distribution of the sentiment label sequence of each microblog and the sentiment label sequence of all its K nearest neighbors are clacuated, and each microblog' fine-grained sentiment sequence is figured out. The experiment shows that the combined method successfully outperforms other baselines by a large margin in fine-grained sentiment classification.
Keywords/Search Tags:Microblog, Fine-Grained, Sentiment Analysis, Na(i|")ve Bayesian, K-Nearest Neighbor
PDF Full Text Request
Related items