Font Size: a A A

Research On Feature Selection For Sentiment Classification

Posted on:2015-02-02Degree:MasterType:Thesis
Country:ChinaCandidate:Z H WangFull Text:PDF
GTID:2268330428998399Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Recently, in order to automatically analyze subjective information, sentiment analysishas been proposed and received a great deal of attention. Sentiment analysis aims toexploit valuable information from these subjective texts using Natural LanguageProcessing technology. Just like many other text classification tasks, sentimentclassification suffers from the high-dimensionality of data which sometimes makeslearning algorithms intractable. Therefore, the feature selection method of sentimentclassification is highly valuable for many practice usages and theory studies. Thisdissertation conducts extensive studies on feature selection method of sentimentclassification as follows.First, this dissertation focuses on the sentiment classification task where the datadistribution is imbalanced (named imbalanced sentiment classification). We investigatethree different feature selection mechanisms based on under-sampling and compare theperformance of four classic feature selection (FS) methods in these feature selectionmechanisms. The experimental results demonstrate that using the feature selection methodsis capable of significantly reducing the dimension of the feature vector on imbalancedsentiment classification.Second, this dissertation proposes a novel feature selection method based on bipartitegraph which focus on semi-supervised sentiment classification. The features are selectedthrough probabilities belonging to the sentimental categories with the help of bipartitegraph model and label propagation algorithm. The experimental results on multipledomains demonstrate that our feature selection method achieves much better performancesthan the random feature selection method. Our approach is capable of significantlyreducing the dimension of the feature vector without any loss in semi-supervised sentiment classification.Third, this dissertation proposes a feature selection method based on cross-languagesentiment classification for the sake of extremely high dimensionality owing to featureextension. The main idea of our approach is use information gain (IG) to pick commonfeatures from labeled and unlabeled data. After that, point-wise mutual information (PMI)is applied to obtain other unique features of unlabeled data. Empirical studies demonstratethat our approach can significantly reduce the dimension of the feature vector dealing withbilingual sentiment classification.
Keywords/Search Tags:Sentiment Classification, Feature Selection, Semi-supervised Learning, Imbalanced Classification, Bilingual Language Processing
PDF Full Text Request
Related items