Font Size: a A A

Sentiment Classification With Bilingual Text

Posted on:2014-01-28Degree:MasterType:Thesis
Country:ChinaCandidate:Y SuFull Text:PDF
GTID:2248330398965370Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Recently, with the development of the internet technology, subjective text informationreleased by the user is undergoing a rather rapid expansion. The opinion information is of ahuge value in real applications. How to deal with the vast amounts of informationautomatically becomes an important research issue, which makes the appearing of thestudy of sentiment analysis. In sentiment analysis, sentiment classification is a basic taskand has undergone significant development.Existing studies on sentiment classification mainly focus on the language of English.However, with the rapid growth of the international web sites, there exist more and moremultilingual text information appearing in Internet. Therefore, multi-language sentimentclassification method study is highly valuable for many practice usages and theory studies.In this paper, we mainly focus on sentiment classification involving two different language,named as bilingual sentiment classification. In details, our study includes the followingthree aspects:First, this paper proposes a novel method for constructing Chinese sentiment lexiconwith existing English resource. First, we utilize the machine translation system withbilingual resources, i.e., English and Chinese information; Then, we get the sentimentorientation of Chinese words by computing the PMI values with English seed words.Third, we adopt label propagation (LP) algorithm to build Chinese sentiment lexicon.Experiment results demonstrate that the lexicon generated with our approach reach anexcellent precision and could cover domain information effectively.Second, this paper proposes a novel sentiment classification approach with bilingualfeature extension. That is to say, a document is represented by the source language andtranslation language together for sentiment classification. Due to the multi-languageexpression that provides additional classification information, experiments show that our method achieves much better classification performance. Meanwhile, two different featureselection modes are proposed and compared to check the effectiveness of the featureselection methods in this specific task, i.e., bilingual sentiment classification. Theexperimental results demonstrate that using the feature selection methods is capable ofsignificantly reducing the dimension of the feature vector without any loss in theclassification performance.Third, we propose a novel semi-supervised learning method based on multi-viewlearning. The main idea of our approach is use both feature partition and languagetranslation strategies to generate multiple views and then a standard co-training algorithmis applied to perform multi-view learning for semi-supervised sentiment classification. Inthe implementation, feature partition strategy is to divide the entire feature space intoseveral independent views, while language translation strategy is to translate the sourcelanguage text into another language, and then generate different language views. Empiricalstudies demonstrate that our propose approach is more effective than other popularsemi-supervised classification methods.
Keywords/Search Tags:Sentiment Classification, Multi-language, Sentiment Lexicon, FeatureExtension, Semi-supervised
PDF Full Text Request
Related items