Font Size: a A A

Research On Chinese Text Sentiment Polarity Classification Based On Naive Bayesian

Posted on:2011-05-21Degree:MasterType:Thesis
Country:ChinaCandidate:D YangFull Text:PDF
GTID:2178330332470297Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
People's sentiment was bipolar about something, positive and negative, for example. For this reason, it was generally accepted that problem of text sentiment polarity classification (TSPC) was to divided text into positive or negative. The problem was a novel research among the text classification, which had huge business value, applied to public opinion analysis, information filtering, product evaluation, product recommendation, intelligent search and user interests mining.In this paper, taken NB as the study object key problems among TSPC such as corpus collection & annotation, building semantic lexicon, feature selection, feature weights & expression vector and etc were studied, and some new ideals were provided, which were verified by the experiment. The main research and results were as follow:1. An automatic collection algorithm for reviews text was designed based on DOM to analyze Chinese web page about hotel reviews, and a Chinese corpus of 7 million characters about hotel reviews (HR) was built by using this algorithm from Internet. The corpus with obvious sentiment came from a reliable source, and had significance for TSPC. Chinese word segmentation (CWS) and sentiment polarity annotation were processed into the text of corpus.2. This paper provided an approach based on PMI to build semantic lexicons about hotel reviews (HRSL) in HR, the seed word from BSL. A HRSL was built by means of this approach, and this lexicon had better results used in TSPC.3. This paper provided a new parameter setting of the Laplace on posterior probability and a new TSPC approach of Chinese based on NB. The approach reached its goal by applying semantic lexicon on text processing and expressing, and this approach demonstrated its efficiency, accuracy and robustness, and it is better than other approach of TSPC based on CHI or semantic orientation , which makes it applicable as well in sentiment classification for plenty texts.4. A TSPC test system of Chinese was developed, and it is friendly, fast and stability. The system had functions such as CWS, computing feature weights, feature selection basing on CHI, building semantic lexicon, TSPC basing on NB and etc.
Keywords/Search Tags:Chinese text sentiment classification, Corpus collection, Feature selection, Naive Bayesian, Semantic lexicon
PDF Full Text Request
Related items