Font Size: a A A

Extended Topic Model For The Sentimental Classifications Of Chinese Online Reviews

Posted on:2016-01-12Degree:MasterType:Thesis
Country:ChinaCandidate:J WangFull Text:PDF
GTID:2308330467482353Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The arrival of the big data age not only brought us challenges but also opportunities. Also withthe arise of SNS websites like Twitter and Facebook, people post massive comments includingsentiment information on the Internet. Different with objective version such as news, blogs, thesecomments are more of subjectivity, and reflect public opinions on the aforesaid targets, whichmeans they are of high value as reference to potential users, merchants and government. Forexample, before we start a trip or go shopping, we now used to search the internet for the commentsof those who have already been there or bought the product we need because their experience on theroutes and the products helps us to schedule a plan and make the final decision. For potential users,these comments can be a main factor of using a certain product or not. For merchants, they need thiskind of feedback to improve their product or service and get to know necessary information abouttheir competitors. For government, it becomes a way to understand how their people think about theresent events. And makes new policies based on these thoughts.In this paper, a new Topic-Sentiment mixture classification model was proposed based on thecharacter of the LDA model. Each comment contains someone’s opinions to some events and allthat are irrelevant to other comments. So, we can treat each comment as one document, sampletopic from it. What’s more, we also hold the point that each sentence in the comment contains onesentiment no matter positive or negative. That’s the reason why we sample sentiment label fromevery sentiment of one comment. Finally we will get a sentiment distribution over the documentwith which we will use for sentiment classification. Further more, if we want to know the wholesentiment tendency of the corpus, simply calculate every document’s sentiment is fine.The traditional ways to cluster data like k-mean, k-medoids need a cluster number k to clusterdata into k different clusters. Some others methods are very time-expensive and not very fit in thispaper’s situation. In this paper, I proposed a new method based Distance-Dependent ChineseRestaurant Process Model(DDCRP Model) under the Non-Parametric Bayesian Framework. And Iuse the semantic similarity between terms and words as the “distance” in the DDCRP model. TheDDCRP model can always output a fit number of clusters no matter how the data grows.Experiments show that DDCRP model works well on all my data sets and it can also work well withLSS model, although the topic number DDCRP finds out do not get the best performance. Thefuture work is to make sure DDCRP can find the topic number which has the highest accuracyunder LSS model. In the paper, I first analysis the urgency of sentiment mining in this big data age. And then Italked about some characters of the comments, proposed a new Topic-Sentiment mixtureclassification model-samples topic and sentiment from comments. Then to fix the problem that LSSmodel needs topic number input, a new method was also proposed. Experiments show the twomodels work quite fit.
Keywords/Search Tags:Sentiment classification, Chinese comments, LDA, Non-parametric Bayesian, Topicmodel
PDF Full Text Request
Related items