Research On Feature Selection For Sentiment Classification

Posted on:2015-02-02

Degree:Master

Type:Thesis

Country:China

Candidate:Z H Wang

Full Text:PDF

GTID:2268330428998399

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Recently, in order to automatically analyze subjective information, sentiment analysishas been proposed and received a great deal of attention. Sentiment analysis aims toexploit valuable information from these subjective texts using Natural LanguageProcessing technology. Just like many other text classification tasks, sentimentclassification suffers from the high-dimensionality of data which sometimes makeslearning algorithms intractable. Therefore, the feature selection method of sentimentclassification is highly valuable for many practice usages and theory studies. Thisdissertation conducts extensive studies on feature selection method of sentimentclassification as follows.First, this dissertation focuses on the sentiment classification task where the datadistribution is imbalanced (named imbalanced sentiment classification). We investigatethree different feature selection mechanisms based on under-sampling and compare theperformance of four classic feature selection (FS) methods in these feature selectionmechanisms. The experimental results demonstrate that using the feature selection methodsis capable of significantly reducing the dimension of the feature vector on imbalancedsentiment classification.Second, this dissertation proposes a novel feature selection method based on bipartitegraph which focus on semi-supervised sentiment classification. The features are selectedthrough probabilities belonging to the sentimental categories with the help of bipartitegraph model and label propagation algorithm. The experimental results on multipledomains demonstrate that our feature selection method achieves much better performancesthan the random feature selection method. Our approach is capable of significantlyreducing the dimension of the feature vector without any loss in semi-supervised sentiment classification.Third, this dissertation proposes a feature selection method based on cross-languagesentiment classification for the sake of extremely high dimensionality owing to featureextension. The main idea of our approach is use information gain (IG) to pick commonfeatures from labeled and unlabeled data. After that, point-wise mutual information (PMI)is applied to obtain other unique features of unlabeled data. Empirical studies demonstratethat our approach can significantly reduce the dimension of the feature vector dealing withbilingual sentiment classification.

Keywords/Search Tags:

Sentiment Classification, Feature Selection, Semi-supervised Learning, Imbalanced Classification, Bilingual Language Processing

PDF Full Text Request

Related items

1	Research On Sentiment Classification Based-upon Imbalanced Data
2	Sentiment Classification With Bilingual Text
3	Selection And Classification Of Unbalanced Data Based On Semi - Supervised And Integrated Learning
4	Feature Selection And Semi-supervised Classification For Imbalanced Data
5	Research On Active Learning For Sentiment Classification
6	Research On Feature Selection And Semi-Supervised Classification
7	Research On Imbalanced Dataset Classification In Semi-supervised Learning
8	Based On The Positive And Unlabeled Samples, Semi-supervised Classification
9	Sentiment Classification Research Based On Semi-supervised Learning
10	A Research On Imbalanced Learning Based On Semi-supervised SVM