Font Size: a A A

Research On Key Problems In Text Sentiment Analysis

Posted on:2012-08-20Degree:MasterType:Thesis
Country:ChinaCandidate:D DanFull Text:PDF
GTID:2178330335460633Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the rise and popularity of Web2.0, web texts surge. Thus, sentiment analysis for text, which is attributed to text classification problems, is the current focus of research in the text mining field. This thesis, based on machine learning, proceeded beneficial trials aiming at polarity classification and sentiment classification of variable text particle sizes, as well as query formulation for topic relevance retrieval, and proposes the result as follows.We propose a word-level classification model of emotional tendencies in Chinese language based on CRF to quantify the activity degree of emotional words in four classes, the'joy, anger, sadness and fear'. Feature selection in text classification and classification model based on CRF are our key issues. We analyze the part-of-speech, parse, negative words, transition words, degree words features and the use of location information. Experiments on COAE evaluation corpus show the effectiveness of this method in precision and recall rate.We propose a sentence-level classification model of emotional tendencies in Chinese language based on maximum entropy, and finally get three types of sentences:positive, negative, and the objective. We focus on the Unigram features, Bigram features, negative words features, degree words features, and the comparison of weight calculation. Experiments on COAE evaluation corpus show the effectiveness of this method in precision and recall rate as well.Sentiment analysis is closely related to retrieval technology. We propose a semi-supervised query builder based on CRF, and compare it with manual adjustment and query builder based on unsupervised machine learning. Finally, we combine the document retrieval model and the paragraph retrieval model together, making up for the loss of recall rate due to the accuracy improvement. Experiments based on Blog Track corpus show the good results.
Keywords/Search Tags:text classification, sentiment analysis, CRF, maximum entropy, query formulation
PDF Full Text Request
Related items