Research On Text Sentiment Classification Of Chinese

Posted on:2012-02-07

Degree:Master

Type:Thesis

Country:China

Candidate:Y P Ceng

Full Text:PDF

GTID:2178330335950909

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Text sentiment classification is automatically classifying sentiment of text by mining and analyzing subjective information in the text, such as standpoint, view, mood, and so on. It becomes more significant as more poeple express their viewpoints on web.The key technologies of text sentiment classification contain text extraction, text representation (vector space model, Boolean model and probability model), feature extraction (document frequency, chi-square statistics, mutual information, information gain, expected cross entropy, text weight of evidence) and text classification (Bayes classifier, support vector machines, KNN, neural networks). The main work in this paper includes the following:(1) Achieve the technology of text extraction from a web page and do researches on the techniques of text preprocessing. Based on the study of getting source code from the web server, we design a regular expression to achieve the extraction of the text web pages. Then we design the method to implement the vector space model to represent the text.(2) Design and implement the algorithms of sentiment sentence recognition based on the sentiment dictionary and Naive Bayes separately to subjectively classify the text sentiment. The former one gets the set of the subjective and objective sentences by comparing with the sentiment dictionary, using text preprocessing and text representation. The latter one uses the model of Naive Bayes classification after text preprocessing, text representation and features extracting with the model of the information gain. The results show that the former one performs better in the classification than the latter one.(3) Propose a hybrid algorithm for extracting the features through the text. By analyzing and comparing the advantages and disadvantages among several commonly used feature extraction algorithms, we choose the document frequency, mutual information, information gain and chi-square statistic, and then we do union operation on the four subsets. Through the experiments, we know that more accurate classification could by gained by using the hybrid algorithm to extract the features than using a single algorithm.(4) Design and implement three algorithms (support vector machine, Naive Bayes and KNN) to achieve the text sentiment classification according to the characteristics of the feature set. The results show that:The results show that:support vector machine performs best requiring the largest computation. Naive Bayes owes the second with a smaller computation. KNN computes fastest with worst performance.

Keywords/Search Tags:

text classification, feature extraction, text sentiment classification

PDF Full Text Request

Related items

1	Research On Text Sentiment Classification Of Chinese
2	A Study Of Text Classification Algorithms Based On Feature Selection
3	Research On Classification Of Massive Text Feature Under Distributed Architecture
4	Research On Text Sentiment Classification Based On Deep Learning
5	Research On The Method Of Text Feature Extraction
6	Research On Text Sentiment Classification
7	A Research Of Text Sentiment Classification Algorithm Based On Attention Mechanism
8	Research On Sentiment Text Classification For Product Reviews
9	Research On Problems For Sentiment Classification Of Review Texts Based On Web
10	Design And Implementation Of Text Classification Model Based On The Improved TF-IDF Feature Extraction