Extended Topic Model For The Sentimental Classifications Of Chinese Online Reviews

Posted on:2016-01-12

Degree:Master

Type:Thesis

Country:China

Candidate:J Wang

Full Text:PDF

GTID:2308330467482353

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

The arrival of the big data age not only brought us challenges but also opportunities. Also withthe arise of SNS websites like Twitter and Facebook, people post massive comments includingsentiment information on the Internet. Different with objective version such as news, blogs, thesecomments are more of subjectivity, and reflect public opinions on the aforesaid targets, whichmeans they are of high value as reference to potential users, merchants and government. Forexample, before we start a trip or go shopping, we now used to search the internet for the commentsof those who have already been there or bought the product we need because their experience on theroutes and the products helps us to schedule a plan and make the final decision. For potential users,these comments can be a main factor of using a certain product or not. For merchants, they need thiskind of feedback to improve their product or service and get to know necessary information abouttheir competitors. For government, it becomes a way to understand how their people think about theresent events. And makes new policies based on these thoughts.In this paper, a new Topic-Sentiment mixture classification model was proposed based on thecharacter of the LDA model. Each comment contains someoneâ€™s opinions to some events and allthat are irrelevant to other comments. So, we can treat each comment as one document, sampletopic from it. Whatâ€™s more, we also hold the point that each sentence in the comment contains onesentiment no matter positive or negative. Thatâ€™s the reason why we sample sentiment label fromevery sentiment of one comment. Finally we will get a sentiment distribution over the documentwith which we will use for sentiment classification. Further more, if we want to know the wholesentiment tendency of the corpus, simply calculate every documentâ€™s sentiment is fine.The traditional ways to cluster data like k-mean, k-medoids need a cluster number k to clusterdata into k different clusters. Some others methods are very time-expensive and not very fit in thispaperâ€™s situation. In this paper, I proposed a new method based Distance-Dependent ChineseRestaurant Process Model(DDCRP Model) under the Non-Parametric Bayesian Framework. And Iuse the semantic similarity between terms and words as the â€œdistanceâ€ in the DDCRP model. TheDDCRP model can always output a fit number of clusters no matter how the data grows.Experiments show that DDCRP model works well on all my data sets and it can also work well withLSS model, although the topic number DDCRP finds out do not get the best performance. Thefuture work is to make sure DDCRP can find the topic number which has the highest accuracyunder LSS model. In the paper, I first analysis the urgency of sentiment mining in this big data age. And then Italked about some characters of the comments, proposed a new Topic-Sentiment mixtureclassification model-samples topic and sentiment from comments. Then to fix the problem that LSSmodel needs topic number input, a new method was also proposed. Experiments show the twomodels work quite fit.

Keywords/Search Tags:

Sentiment classification, Chinese comments, LDA, Non-parametric Bayesian, Topicmodel

PDF Full Text Request

Related items

1	The Sentiment Analysis Of The Comments Of The E-commerce Goods
2	Research On Chinese Text Sentiment Polarity Classification Based On Naive Bayesian
3	Research On Chinese Text Sentiment Polarity Classification
4	Sentiment Analysis Of Chinese Reviews Based On Hotel Comments
5	A Construction Of Sentiment Topic Model-Based On Non-Parametric Bayesian Methods
6	Text Sentiment Analysis Of Chinese Comments For Online Public Opinion
7	Research On Sentiment Classification Model Based On Web Comments
8	Research And Application Of Sentiment Classification Technology Based On Web Comments
9	Research On Sentiment Analysis Method Based On Product Comments
10	Research On Sentiment Classification Of Multilingual Network Comments Based On XLM-R