Font Size: a A A

The Research Of Cross-Domain Sentiment Classification Algorithms For Chinese Short-Texts

Posted on:2017-07-07Degree:MasterType:Thesis
Country:ChinaCandidate:W ChenFull Text:PDF
GTID:2348330509953989Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the rapid development of electronic commerce and the rise of microblog and We Chat, the short-text comments on the Internet grow exponentially and the comments have huge economic and social value. The traditional manual processing method is becoming more and more difficult, and how to automaticall y mine the useful information from the comments becomes a research hotspot in natural language processing. Sentiment classification for Chinese short-texts comes into being, and owing to the cross-domain sentiment classification applied to the domain which lacks labeled comments goes gradually popular.Sentiment classification, which is a kind of subjective text mining technology, is mainly used to estimate the sentiment orientation and comment attitude(such as positive or negative, recommended or unrecommended and so on) of commentator on some entities such as products, services, events and so on. Based on study of the existing sentiment classification algorithms and the related technologies, several sentiment classification algorithms are put forward in this thesis. The research results are as follows.(1)Proposed the sentiment classification algorithm based on Sentiment Sensitive Thesaurus(SST). Aiming at the problem of domain-independent between the source domain() and the target domain() in the sentiment classification, the thesis puts forward a method to construct SST and then uses it to extend the eigenvector of the comment texts for the source domain and the target domain. SST, which is created basing on all comments of and , contains the features of both domains. The proposed algorithm uses support vector machine to train a classifier with extended , and then uses the classifier to predict the extended . In the experiment, the data sets include hotels, computers and books, the experimental results show the superior classification performance of sentiment classification algorithm based on SST. The thesis also has discussed the effects of the parameter K and the training sets' size on the classification result of the classifier.(2)Proposed cross-domain sentiment classification algorithm based on voting integration. By using ensemble learning theory, it combines the results of multiple base classifiers to improve the classification performance. In the experiment, we use both simple voting and weighted voting strategies to combine the base classifiers. By testing them on three corpora(hotels, computers and books), the results show that cross-domain sentiment classification classifier based on voting integration is obviously superior to the base classifiers.(3)Improved stacking algorithm for cross-domain sentiment classification. Firstly, it uses unsupervised NTU Sentiment Dictionary to classify the comments of target domain, then labels the comments which have more obvious sentiment polarity, and then adds labeled comments into the source-domain in order to extend the training sets and reduces the domain-independent. The experiment shows that the improved stacking algorithm can obtain a superior classification result. Applying ensemble learning into cross-domain sentiment classification is worth researching.
Keywords/Search Tags:Sentiment Classification, Cross-domain, Feature Learning, Ensemble Classifier, Support Vector Machine
PDF Full Text Request
Related items