Font Size: a A A

Study On Domain Adaptation For Sentiment Classification

Posted on:2012-12-24Degree:MasterType:Thesis
Country:ChinaCandidate:W R YangFull Text:PDF
GTID:2218330368992245Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development and popularization of Internet, there are more and more subjective remarks available in Internet. With respect to these subjective remarks and identifying their semantic orientation, the methods of traditional topic-based text classification becomes incapable of meeting people's needs.Therefore, sentiment classification has been paid more and more attention by various researchers.Sentiment classification is a very domain-specific problem; classifiers trained in one domain usually perform poorly in some others. If, in every domain, a classification model is trained, it would need a lot of annotated corpus. Since labeling data is very time-consuming and expensive, domain adaptation approaches for sentiment classification becomes valuable to handle the cross-domain classification problems.In this study, we focus on the domain adaptation for sentiment classification. Our main work and contributions include:(1)In order to eliminate feature's statistical distribution's difference between domains, we propose a novel feature selection approach which unions feature's similarity. By this way, we can choose sentiment features which have similar statistical distribution in two domains, which can improve the classification performance.(2)We propose a novel domain adaptation approach for sentiment classification under centroid-transfer. The approach makes full use of labeled documents in the source domain to label target's documents and choose a part of confident documents to join the training set, simultaneously remove some of the source domain's documents which are far form the test's centroid, by iteration between the two domains gradually narrow the centroid distance, reducing the differences between domains. The experiment results indicate that the proposed approach could significantly improve the performance of cross-domain sentiment analysis.(3) Based on the finding that the same domain's documents may have different features in different domains, and the document may also have certain similar features, we propose a new approach to do classification. Specifically, two domains of documents are first clustered and then classification is performed in each clusting. This approach can reduce the differences between the domains and thus improve the classification results.
Keywords/Search Tags:Sentiment Classification, Domain Adaptation, Feature Selection, Centroid Transfer, Clustering
PDF Full Text Request
Related items