Font Size: a A A

Research On Sentiment Polarity Analysis Of Chinese Microblog Based On Active Learning And Co-Training

Posted on:2018-04-11Degree:MasterType:Thesis
Country:ChinaCandidate:W W LiFull Text:PDF
GTID:2428330596454804Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The high-speed development of the Internet,especially the rapid development of Web2.0 technology,makes the users become the producers and managers of the Internet.As a new social networking platform,microblog is favored by users.It is walking into the life of the ordinary network users increasingly.The users share their ideas and express their feelings and experiences through the microblog platform.The text with emotional color and emotional tendency produced by this way is growing exponentially.It can be useful to understand the public's sentiment indices and opinions about event or product by analyzing the sentiment polarity of these microblogs with subjective emotion.It also can be beneficial for the realization of the public opinions monitoring,the events forecasting and the commercial competitive intelligence analysis,etc.At present,some academics have explored in the sentiment analysis polarity of microblog.However,most research focuses on the English microblog.The text content of Chinese microblog is usually short,colloquial and a wide range of topics.Furthermore,the huge amount of microblog data is also difficult and expensive to annotate.Based on this,the thesis explores some new research ideas and methods to further carry research on the analysis of sentiment polarity of Chinese microblog.The main research content and innovation points are as follows:1)In view of problems of insufficient features and sparse text vector in Chinese microblog sentiment polarity analysis,a sentiment features extraction method which compounds the N-GRAM features and the 8 microblog semantic relevant features is proposed.In the process of extracting the features of emoticons and sentence composition rules of microblog,regarding the workload of judging the sentiment polarity of the microblog clause by manual annotation appears relatively large.The thesis proposes an algorithm referring sentiment tendency of clause based on sentiment dictionary to improve the speed of corpus features extraction.In order to solve the problem that the open source sentiment dictionary has a poor effect in the sentiment polarity classification of microblog,the thesis also proposed the domain-specific sentiment dictionary construction method based on graph propagation algorithm.The experimental results showed that the proposed feature extraction method can effectively improve the sentiment classification results.2)In view of problems that Chinese microblog has a large scale data but the labeled data are scarce and expensive to obtain,sentiment classification algorithm based on Active Learning for Chinese microblog is presented.The combination of uncertainty sample selection strategy and highest confident unlabeled sample selection strategy is presented to enrich the initial training set.And Long Short Term Memory model is used as the base classifier of Active Learning algorithm to study the effectiveness of applying the deep learning algorithm in the active learning algorithm.The experimental results showed that our Active Learning algorithm can effectively use the sentiment information of unlabeled samples,and get a better sentiment classification results.3)In views of only relying on the lexical features based on text vector space model has a low accuracy in the sentiment polarity classification of Chinese microblog.The thesis proposes a revised version Co-training algorithm that is less restrictive Cotraining assumption to integrate the sentiment polarity classification method based on Word Representation.The proposed Co-training algorithm need to build two classifiers: one is based on Support Vector Machine,the other is based on Long Short Term Memory,and the uncertainty sample selection strategy of Active Learning algorithm is compounded.By the end of Co-training,the ultimate sentiment polarity of microblog is determined by voting strategy.The experimental results verified our revised version Co-training algorithm can significantly improve the performance of sentiment polarity classification of Chinese microblog.4)In views of the lake of the general classification system based on Active Learning and Co-training in the field of microblog sentiment polarity classification,the thesis designs and implements a prototype system of Chinese microblog sentiment polarity classification based on Active Learning and Co-training.The system realizes the automatic processing of the whole process,including text preprocessing,feature extraction,model training and polarity prediction,and it can be used as the foundation for the future research on the sentiment classification of microblog.In the last part of this thesis,the microblog sentiment polarity classification experiment was carried out in this system to validate its practicability and usability.
Keywords/Search Tags:Sentiment polarity analysis, Chinese microblog, SVM, LSTM, Co-training
PDF Full Text Request
Related items