Font Size: a A A

Semi-supervised Sentiment Classification Based On Ensemble Learning With Voting Combination

Posted on:2016-05-26Degree:MasterType:Thesis
Country:ChinaCandidate:W HuangFull Text:PDF
GTID:2308330476953452Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet, more and more people are willing to express their views through the Internet. Analysis and mining to these subjective texts can identify the inherent emotion tendency which has important application value in many fields such as E-commerce,public opinion monitoring and so on. Therefore, sentiment classification has become a research hotspot in natural language processing. In this paper, we mainly focus on semi-supervised approaches for this hotspot.Traditional method based on Co-training requires texts described with large quantity of useful attributes sets, and its training process is linear time complexity which is not applicable to non-equilibrium corpus.This paper presents a basic hypothesis: if the sub-classifiers give the similar opinion on one text, the text should have a higher confidence level at predicting than that get different opinions from the sub-classifiers, and the larger the difference, the lower the confidence level. Moreover, we have verified the basic hypothesis on a large number of datasets.Based on the basic hypothesis, this paper presents a semi-supervisedsentiment classification based on ensemble learning with voting combination. We construct a set of diversified sub classifiers by choosing different training sets, feature parameters and classification methods.During each voting round, samples with highest confidence coefficient are picked out to double the size of training set and then to update the training model data. This new method also allows sub classifiers to share useful attributes sets. It has a logarithmic time complexity and can be used for non-equilibrium corpus. Experiments show that this method has achieved good results in the sentiment classification task with corpus in different languages, areas, sizes, both balanced and unbalanced corpus.
Keywords/Search Tags:Machine Learning, Sentiment Classification, Ensemble learning, Semi-supervised Learning
PDF Full Text Request
Related items