Font Size: a A A

Research On Sentiment Classification Methods Of Web Review Texts

Posted on:2016-01-06Degree:DoctorType:Dissertation
Country:ChinaCandidate:P ZhangFull Text:PDF
GTID:1108330503452342Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of e-commerce and social media, users are increasingly accustomed to publish their reviews about various objects and topics including products, news events, public figures etc. Faced with the explosive growth of massive Web review texts,it is difficult to manually analyze opinions of users for effective use. Therefore, sentiment analysis has received much attention in recent years. As a core task in sentiment analysis, sentiment classification aims to identify the sentiment polarity expressed in an opinionated piece of text, it has important academic value and broad application prospects, and faces many challenging problems.In this thesis, a series of studies are carried out on the problems of data sparsity, the difficulty of labeling samples and the imbalance of the sentiment resources. The main work and contributions include the following:â‘  A sentence-level polarity classification method based on feature enrichment of text representation and ensemble technique is proposed. Aiming at the problem of data sparsity in the sentence-level reviews, this thesis exploits latent topic feature set and related word feature set learned from large external unlabeled dataset as additional features to enrich data representation. After training classifiers by using enriched text representations, the thesis furtherly proposes an ensemble approach by using these additional features to guide the design of different members of the ensemble and to get the final classifier. Extensive experimental results demonstrate that both features are effective for alleviating the data sparseness problem, and the proposed ensemble approach is effective for using semantic information of both features to improve the performance of polarity classification.â‘¡ An unsupervised sentiment classification framework based on sentiment lexicons and machine learning is proposed. Aiming at the problem of the difficulty of labeling samples in a supervised learning system. This thesis proposes an unsupervised sentiment classification framework which does not rely on artificial annotation corpus. The framework is divided into two stages for sentiment classification. In the first stage, it uses sentiment lexicons to select high reliability samples from unlabeled corpus to form pseudo label training set. In the second stage, it uses semi-supervised method for using pseudo label training set and unlabeled data to learn a classifier and obtain classification results. Experimental results on the four public datasets show that the proposed framework can effectively improve the classification performance. In addition, the thesis compares the performance effects of various semi-supervised learning methods and finds that self-training method is suitable for the framework owing to the characteristics of good classification performance and strong adaptability.â‘¢ An unsupervised sentiment classification method based on dataset partition and self-training is proposed. For self-training method, in the iterative process, the accumulated sample noise will lead to classification performance degradation. Based on the previous research work, aiming at the problem of the self-training iterative process, this thesis introduces an enhanced self-training method based on dataset partition, the method uses two classifiers in the iterative process for classification consistency check. The experimental results on the four public datasets show that the unsupervised sentiment classification method based on the enhanced self-training classifier can effectively reduce the impact of the error tag samples, and has competitive advantages over a set of baseline approaches, it even outperforms the supervised approach in some of the datasets despite using no labeled documents.â‘£ A cross-language sentiment classification method based on random subspace and co-training is proposed. Aiming at the problem of the imbalance of the sentiment resources, this thesis studies the problem of cross-language sentiment classification to make full use of the resources of different languages. Based on the use of linguistic knowledge, a random subspace method based on part of speech is proposed and applied to the datasets of two languages. By using the random subspace method to obtain sub views in co-training framework, the experimental results show that when using the random subspace method based on the part of speech in co-training framework, because of the more redundant views, it can effectively improve the performance of cross-language sentiment classification.
Keywords/Search Tags:Sentiment Classification, Feature Enrichment, Semi-supervised Learning, Self-Training, Co-Training
PDF Full Text Request
Related items