Font Size: a A A

Research On Text Sentiment Classification Of Chinese

Posted on:2012-02-07Degree:MasterType:Thesis
Country:ChinaCandidate:Y P CengFull Text:PDF
GTID:2178330335950909Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Text sentiment classification is automatically classifying sentiment of text by mining and analyzing subjective information in the text, such as standpoint, view, mood, and so on. It becomes more significant as more poeple express their viewpoints on web.The key technologies of text sentiment classification contain text extraction, text representation (vector space model, Boolean model and probability model), feature extraction (document frequency, chi-square statistics, mutual information, information gain, expected cross entropy, text weight of evidence) and text classification (Bayes classifier, support vector machines, KNN, neural networks). The main work in this paper includes the following:(1) Achieve the technology of text extraction from a web page and do researches on the techniques of text preprocessing. Based on the study of getting source code from the web server, we design a regular expression to achieve the extraction of the text web pages. Then we design the method to implement the vector space model to represent the text.(2) Design and implement the algorithms of sentiment sentence recognition based on the sentiment dictionary and Naive Bayes separately to subjectively classify the text sentiment. The former one gets the set of the subjective and objective sentences by comparing with the sentiment dictionary, using text preprocessing and text representation. The latter one uses the model of Naive Bayes classification after text preprocessing, text representation and features extracting with the model of the information gain. The results show that the former one performs better in the classification than the latter one.(3) Propose a hybrid algorithm for extracting the features through the text. By analyzing and comparing the advantages and disadvantages among several commonly used feature extraction algorithms, we choose the document frequency, mutual information, information gain and chi-square statistic, and then we do union operation on the four subsets. Through the experiments, we know that more accurate classification could by gained by using the hybrid algorithm to extract the features than using a single algorithm.(4) Design and implement three algorithms (support vector machine, Naive Bayes and KNN) to achieve the text sentiment classification according to the characteristics of the feature set. The results show that:The results show that:support vector machine performs best requiring the largest computation. Naive Bayes owes the second with a smaller computation. KNN computes fastest with worst performance.
Keywords/Search Tags:text classification, feature extraction, text sentiment classification
PDF Full Text Request
Related items