Font Size: a A A

Study On Chinese Text Sentiment Classification

Posted on:2015-01-25Degree:MasterType:Thesis
Country:ChinaCandidate:L S HuaFull Text:PDF
GTID:2298330422472587Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the rise of microblogging and e-commerce in recent years, the number ofusers and network comments explosive growth. These comments include the judgmentand analysis on the products, hot events and so on. It has a great value and significancefor improving of products and monitoring of public opinion for government. Textsentiment classification is a hotspot in recent years.Text sentiment classification is a binary classification, which determines the text ispositive or negative. Due to the complexity of emotional expression, what part ofspeech of words containing more emotional information and greater help forclassification, it will be discussed in detail in this dissertation.We improve a cross-domain sentiment classification method which combinelearn-based and lexicon-based techniques. Our main work and contributions include:①Investigate the influence of stop words on the text sentiment classification, thestop words is consist of different part of speech words. We have a detail experiment andanalysis on the lexicon-based and learn-based techniques using seven kinds of stopwords and three domain of corpus. The result is that for the lexicon-based method, usingthe stop words except adjectives, adverbs, verbs obtain a better result in general, whilethe stop words used in the traditional subject classification has little or no effort onsentiment classification and for the learn-based method, adjectives, adverbs, verbs andnoun is more important and do not use any stop words obtain the best result.②Improve a cross-domain sentiment classification method which combinedlearn-based and lexicon-based techniques. Generally the approach of Chinese textsentiment classification is based on the sentiment knowledge or the feature selection.The previous one do not need labeled text, it’s simple and easy to implement, but it hasa low accuracy. The latter one has a high accuracy, but it need a lot of labeled text whichis not well for cross-domain sentiment classification. Tan et al propose a novel schemefor sentiment classification which combines the lexicon-based and learn-basedtechniques. It do not need any labeled text but has a good result. In this dissertation weuse a sentiment lexicon constructed by PMI (Point Mutual Information) algorithm toreplace the corresponding part of the original algorithm. The result show that it has abetter accuracy. After that we have a detail analysis on the result impact and thealgorithm parameters generated.
Keywords/Search Tags:Text Sentiment Classification, Stop Words, Sentiment Lexicon, NaiveBayesian, Support Vector Machine
PDF Full Text Request
Related items