Font Size: a A A

Research On The Sentiment Classification Technology For The Weak Labeled Texts

Posted on:2017-04-10Degree:MasterType:Thesis
Country:ChinaCandidate:C XuFull Text:PDF
GTID:2308330485960893Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Web technology and the proposed slogan of the Internet" user-centric, user participation", more and more users participate in the Inter-net. Along with the various types of social networking platforms (e.g., free blog, BBS, weibo, etc) generated a lot of reviews. However, due to the rapid increase of network information, the traditional sociological analysis is difficult to extract accurate senti-ment information from these reviews. Text sentiment classification technology uses the powerful computers to obtain valuable sentiment information of massive amounts of reviews.In order to improve the accuracy of sentiment classification, traditional methods are often need to use a large amount of label samples to learn a classification mod-el. However, we usually only get various unlabeled samples in practice. Because the labeled samples consume large amounts of resources. Therefore, how to leverage the unlabeled samples to improve the accuracy is especially important. In this paper, we firstly leverage heuristic algorithm to generate weak label based on text content. Thereby we obtain valuable weak label text. Then we utilize semi-supervised and deep learning methods to train classification model. In details, our contributions can be summarized as follows:1. We make a detailed survey on the state of the art sentiment classification. We firstly introduce the concept and research progress of sentiment classifica-tion. And we make a detailed survey on sentiment classification based on textual features and neural network.2. We propose a self-training method based on the maximum of sentiment con-fidence. And we perform empirical studies on real projects to show the ef-fectiveness of our method. This is a semi-supervised method. By introducing the sentiment semantic value and sentiment classification value, we calculate the score of sentiment confidence. Based on the sentiment confidence selection crite-ria, we use the self-training method to select the unlabeled sample to the training set. Then we can improve the accuracy of classification. Experimental stud-ies demonstrate that our proposed approach significantly outperforms traditional self-training method.3. We propose a new word embedding training method based on weak label. And we perform empirical studies on real projects to show the effectiveness of our method. This is a deep learning method. This method firstly introduces the LAWE network into the classical unsupervised neural network model. By in-troducing sentiment domain information in training LAWE network and learning of large unlabeled samples, so that the trained word embedding contains senti-ment domain information. And based on the word embeddings, we obtain the representation of the text. Then, under the supervised learning framework, we use classifier to verify on the validation set. Finally, based on the results of ver-ification, we control LAWE network training cycle and determine the best word embeddings. Experimental studies demonstrate the effectiveness of our LAWE method and it achieves better performance when compare to the same type of methods.4. We analyze and summarize the association and distinction between the self-training method based on the maximum of sentiment confidence and the LAWE method.The both methods leverage heuristic algorithm to generate weak label in common. The differences are that the former method has the character-istics of quick speed, high efficiency, and more relies on selected text features. While the latter method can automatically extract features, and has the charac-teristics of slow speed, low efficiency.
Keywords/Search Tags:Sentiment Classification, Semi-supervised Learning, Word Embedding, Neural Network
PDF Full Text Request
Related items