Font Size: a A A

Cross-Domain Sentiment Classification Based On Polarity Of Features

Posted on:2017-04-02Degree:MasterType:Thesis
Country:ChinaCandidate:W ChenFull Text:PDF
GTID:2348330485462201Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet, there are emergence of a large number of micro-Bo, product review information, this information is often with a certain emotional bias, and it reflects people's concern for social, economic and other events; It is becoming increasingly important for customer, producer and the govement. However, the information has a large amount of data, produces fast, has different distribution, and a large number of unlabeled information, these have brought the huge challenge to the current data mining work.In this paper, we study about data of different distribution and a large number of unlabeled information, then study the ability of adaptability of online environment with data of a large amount of information and producing fast, the main work is as follows:(1) Firstly, a general overview of cross domain sentiment classification and data stream mining, including its development background and significance, the main research status, at the same time, the challenge of online also presented.(2) In addition, for the issue of a large number of unlabeled data and different distribution in online reviews, we propose a Polarity Transferring approach based on clusters of Word Embedding(PTWE), we train the vectors of words by word embedding, and distinguish the shared words and specific words based on similarity of word vectors. The polarity of target-specific clusters is computed by transferring the polarity of source-specific clusters. The experimental results show the effectiveness of our approach.(3) Finally, for the problem of poor adaptability of data stream for the large number of unlabeled data, and concept drift in online reviews, we propose Self-Adaptation Online Classification (SAOC), the proposed algorithm uses a labeled data chunk as the starting one, and extract features between the labeled data chunk and the unlabeled data chunk. It uses the similarity of features between two data chunks to test concept drift, and we calculate the polarity of features of the unlabeled data chunks to predict the instances. The experimental results show our algorithm can improve the classification accuracy, especially in the data cases with less label information and more concepts drifts.
Keywords/Search Tags:cross-domain, sentiment classification, Word embedding, Online Environment
PDF Full Text Request
Related items