Font Size: a A A

Research On Sentiment Classification For Chinese Microblog Text

Posted on:2015-03-14Degree:MasterType:Thesis
Country:ChinaCandidate:R DuFull Text:PDF
GTID:2298330431982502Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet media, microblog has beenwidely used in information dissemination and information sharing. In themicroblog platform, hundreds of millions of texts are generated by largenumber of microblog users. The texts contain a variety of different views orattitudes that have potential applications in public opinion monitoring、hotspots detection and so on. How to use natural language processing technologyto identify opinion sentence in microblog and analysis the sentimentorientation of subjective microblog become the research purpose of this paper.This paper has researched sentiment classification of Chinese microblog,and the research work is as follows.(1) The method of automaticlly constructing the emotion lexicon ofmicroblog has been researched in this paper. First, the applicability of emotionlexicons which have been already existed is analyzed in sentimentclassification of microblog. For the low coverage of emotion lexicons, a basicemotion lexicon of microblog is constructed by integrating the resources ofemotion lexicons. And a smoothed SO-PMI algorithm is proposed forestimating the sentiment orientation of words that are not included in basicemotion lexicon of microblog. Then, the emotion lexicon of microblog isconstructed by the basic emotion lexicon and the algorithm of smoothedSO-PMI. At last, the emotion lexicon of microblog is used to sentimentclassification for microblog. The experiment results show the emotion lexiconof microblog has better applicability in sentiment classification for microblog.(2) The classification of subjective text and objective text for Chinesemicroblog has been researched. Aiming at the accuracy which is not high insubjective and objective classification of Chinese microblog, the lexicons andstatistical analysis are used to extract candidate subjective features. And afeature selection algorithm based on rough sets and probability-weighted isproposed. By the algorithm, the opinion words, the punctuation ofexclamation mark, adjectives, degree words, network words and modalparticle are selected as the classification features. At last, these features areused in the experiment of classification of subjective text and objective text.The experiment results show these features selected by the algorithm can achieve good result in subjective and objective classification of Chinesemicroblog.(3) The method of emotional feature selection on subjective microblogtext has been researched. First, the candidate sentiment features are extractedby the part of speech. And the emotion lexicon of microblog is used to filterthe dirty features which are not emotion words. For the shortcomings of thelocal instability when using the algorithm of chi-square to select sentimentfeatures in microblog, the algorithm of chi-square-tfidf is proposed. With thisalgrothm, the emotion features of sentiment classification are selected fromthe filtered candidate features. Finally, experiments are designed to validatethe local stability and effectiveness of the algorithm. The experiment resultsshow that the proposed algorithm has better stability. And when the dimensionof features is300, the accuracy rate is0.794. The result is better than thealgorithm of information gain and the algorithm of sentiment classificationbased on emotion lexicon of microblog.
Keywords/Search Tags:Chinese microblog, sentiment classification, emotion lexicon, rough set, sentiment feature
PDF Full Text Request
Related items