Font Size: a A A

Research On Sentiment Label Extraction

Posted on:2011-05-28Degree:MasterType:Thesis
Country:ChinaCandidate:H Y LiuFull Text:PDF
GTID:2178330338479943Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of web2.0 technology, the Internet produced lots of online reviews, more and more researchers focused on extracting useful information from these reviews. Mining opinion information from reviews can not only help consumers to make a decision when buying products, but also help manufacturers to know the users'proposals in time. Sentiment label is a collection of a polarity word and its related target, contains detailed information on user reviews, can effectively reflect the core content of user reviews. In this paper, we deal with three key issues in sentiment labels: polarity word set construction, target extraction and sentiment label extraction.For polarity word set construction, the goal of this paper is to construct an accurate and comprehensive polarity word set. We first integrate the semantic knowledge-base and large-scale corpus to get the candidate polarity word set, and then get the context of the candidate polarity word in the corpus, using context to set confidence to the candidate polarity word, which reflects the probability that polarity word is correct. Finally, we choose the polarity word having high confidence to form the polarity word set. We use this polarity word set to participate the task one of the first COAE and get a good result.For target extraction, we first get the candidate target set using phrase structure, and then several target filtering algorithms are proposed. Firstly, the targets are domain-dependent, so we use the web-mining PMI to filter the candidate target. Secondly, the noun targets contained in phrase targets are usually redundant, so we use the noun pruning algorithm to filter them. After target set constructed, we classify the sentences in the reviews, and then find the appraised targets in the reviews based on target set. The system based on this method participates the task three of the first COAE and get a good result.For sentiment label extraction, this paper proposes a novel method that uses syntactic paths to automatically recognize the sentiment labels. By using the syntactic path to extract the relationship between polarity word and target, we solved the too strong empirical problem of the nearest approach. By using an automatic method to construct the syntactic path set, we solved the problem that rules constructed by human are always incomplete. Finally, we use the edit distance when match the syntactic path, which improves the system recall effectively.Finally, to solve problem that traditional methods can't extract implicit sentiment labels, this paper tries to use topic model to annotate the sentiment labels for text, and proposes two methods based on PMI and probability distribution similarity. The results show that topic model can play a role in implicit sentiment tag extraction, we analyze the existed problems that topic model used in sentiment label extraction at last.
Keywords/Search Tags:Sentiment Analysis, Sentiment Labels, Target, Polarity Word, Syntactic Path
PDF Full Text Request
Related items