Font Size: a A A

Research On Chinese Microblog Sentiment Classification Based On Automatic Annotation Training Set

Posted on:2015-06-21Degree:MasterType:Thesis
Country:ChinaCandidate:W P LiuFull Text:PDF
GTID:2308330473457000Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Microblog has become one of the most popular social networking by netizens, whose rapid development has shown the huge commercial value and social value. The users have used to find and share information from microblog and publishing the opinions of the current hot spot topics. These opinions are often attached with emotion, therefore, it is valuable to provide large-scale emotion-mining of the microblog data, which can timely understand the user’s response to the public hot spots, products, and policies, etc and can provide the decision support for the user, government agencies and enterprises. So far, these studies mainly concentrate on microblog in English and the studies on microblog in Chinese are still in its infancy. The main research works of this paper are as follows;1. We take sina microblog as the research object and get the large-scale original microblog data through its API open platform, then we analyze the microblog texts and differences of the microblog text and traditional network comment text.2. Because there is no high standard microblog emotional corpus, this paper puts forward a method of automatically tagging the microblog corpus training set based on emoticons and mental lexicon to obtain training sets of positive and negative emotion classification and the seven class sentiments classification. This method eliminates the much burden of manual tagging and reduce the dependence on domain,themes and the time factor. Based on the method,we construct a certain scale corpus.3. This paper mainly divides the sentiment classification tasks into two kinds, namely two emotions (positive, negative) classification and the seven emotions (happy, loved, surprised, anxiety, grief, anger, evil) classification. We use the corpus of two classification task which are automatically labeled as training set to train a classifier for automatically classification on polarities of the microblog.4. For the above two emotional classification tasks,we carry out the experiments based on n-gram items, as well as the cross-validation experiments of two selection methods (Information Gain, CHI-square statistic) combined with two classification algorithm(Naive Bayes and support vector machine). Experimental results show that the overall performance of the positive and negative sentiment classification is better than seven sentiments. In the positive and negative sentiment classification task, the performance of Unigram feature is better than Bigram feature; the performance of information gain in combination with support vector machine (SVM) is best. In the seven class sentiments classification task, the performance of Bigram feature is better than Unigram feature; when combining the two feature selection methods with Naive Bayes and Support Vector Machine algorithm experiments, the average value of F-measures are no great differences.
Keywords/Search Tags:microblog, sentiment analysis, automatic annotation, feature selection
PDF Full Text Request
Related items