Research On Chinese Microblog Sentiment Classification Based On Automatic Annotation Training Set

Posted on:2015-06-21

Degree:Master

Type:Thesis

Country:China

Candidate:W P Liu

Full Text:PDF

GTID:2308330473457000

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

Microblog has become one of the most popular social networking by netizens, whose rapid development has shown the huge commercial value and social value. The users have used to find and share information from microblog and publishing the opinions of the current hot spot topics. These opinions are often attached with emotion, therefore, it is valuable to provide large-scale emotion-mining of the microblog data, which can timely understand the user’s response to the public hot spots, products, and policies, etc and can provide the decision support for the user, government agencies and enterprises. So far, these studies mainly concentrate on microblog in English and the studies on microblog in Chinese are still in its infancy. The main research works of this paper are as follows;1. We take sina microblog as the research object and get the large-scale original microblog data through its API open platform, then we analyze the microblog texts and differences of the microblog text and traditional network comment text.2. Because there is no high standard microblog emotional corpus, this paper puts forward a method of automatically tagging the microblog corpus training set based on emoticons and mental lexicon to obtain training sets of positive and negative emotion classification and the seven class sentiments classification. This method eliminates the much burden of manual tagging and reduce the dependence on domain,themes and the time factor. Based on the method,we construct a certain scale corpus.3. This paper mainly divides the sentiment classification tasks into two kinds, namely two emotions (positive, negative) classification and the seven emotions (happy, loved, surprised, anxiety, grief, anger, evil) classification. We use the corpus of two classification task which are automatically labeled as training set to train a classifier for automatically classification on polarities of the microblog.4. For the above two emotional classification tasks,we carry out the experiments based on n-gram items, as well as the cross-validation experiments of two selection methods (Information Gain, CHI-square statistic) combined with two classification algorithm(Naive Bayes and support vector machine). Experimental results show that the overall performance of the positive and negative sentiment classification is better than seven sentiments. In the positive and negative sentiment classification task, the performance of Unigram feature is better than Bigram feature; the performance of information gain in combination with support vector machine (SVM) is best. In the seven class sentiments classification task, the performance of Bigram feature is better than Unigram feature; when combining the two feature selection methods with Naive Bayes and Support Vector Machine algorithm experiments, the average value of F-measures are no great differences.

Keywords/Search Tags:

microblog, sentiment analysis, automatic annotation, feature selection

PDF Full Text Request

Related items

1	Study Of Microblog Sentiment Analysis Based On Semantic Feature
2	The Research Of Feature Selection Method And Sentiment Analysis Based On Microblog
3	Sensiment Classification Of Micro-blogs Corpus Based On Automatic Annotation Training Set
4	Research And Application On Chinese Micro-Blog Sentiment Classification
5	Microblog Emotional Dictionary Built And Application On Sentiment Analysis Of Microblog
6	The Research On Chinese Microblog Sentiment Analysis Based On Rules And Machine Learning Methods
7	Sentiment Analysis Of Microblog Product Reviews Based On Feature Ontology And Sentiment Lexicon
8	Research On Sentiment Analysis Of Microblog Text Based On Recognition Of Sentiment New Words
9	The Research And Implementation Of Distributed Sentiment Analysis For Chinese Microblog Based On Hadoop
10	Research On Sentiment Analysis Of Microblog