Font Size: a A A

Research On Sentiment Analysis Of Microblog Based On Semi-supervise Recursive Auto Encoder

Posted on:2015-05-09Degree:MasterType:Thesis
Country:ChinaCandidate:J J ChengFull Text:PDF
GTID:2348330509460652Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of Web technology and the popularization of smart mobile communication equipment, microblog has become an important media for network users to access information, express opinions and make friends online, by virtue of its simplicity and openness. Microblog has changed the traditional mode of internet interaction and become a sensor of the real society with the characteristics of real-time, widespread and high interactivity. Therefore, it is significant to have a research on microblog users' sentiment and opinion to hot events of society when we want to grasp the trend of public opinion and guide the public opinion in time.Taking hot topic of Sina Microblog as a case target, this paper focus on three key technologies and methods, i.e. data collection, sentiment classification of text and analysis of hot topics. The details are as following:1) Design and implement of an active crawler for hot topics of Sina Microblog based on mixed strategy. According to the characteristics of the data, we designed a method combined with web page analysis and Sina Microblog API, which can break through Sina's restriction of data collection. Based on the design, a data crawler was implemented in Java with mysql database.2) Study of sentiment classification of microblogging texts based on semi-supervised recursive auto encoder. Microblogging texts are always short and colloquial, therefore, this paper use recursive auto encoder, which can grasp sentence structure well, to classify the sentiments of microblogging texts. Recursive auto encoder based method has gained higher accuracy than SVM based method in three public datasets. After that, a semi-supervised training method for recursive auto encoder was presented, making it more accurate, general and steady.3) Sentiment Analysis on hot topics of Sina microblog. The hot topic data collected is classified into neutral, positive or negative by the semi-supervised recursive auto encoder based sentiment classification method. Based on that, the sentiment distribution of different kinds of hot topics and extremely negative topics are analyzed and we find that microblog users have positive sentiment in majority of hot topics, especially topics of entertainment, technology and sport. However, in most social event and government related topics microblog users expresse negative sentiment. Besides, most of extremely negative topics are social events, especially government related events. At last, correlation analysis is conducted in multiple topics, taking Fang Zuming and Ke Zhendong drug events as a case.In summary, revolving round sentiment analysis of microblog, this paper studies two key technologies, i.e. microblog topic data collection method based on web page analysis combined with Microblog API and sentiment classification method of microblogging text based on semi-supervised recursive auto encoder, and analyzes the regulation of different kinds of hot topics of Sina Microblog in sentiment distribution. The research has important significance in the analysis and guidance of public opinion.
Keywords/Search Tags:Microblog, Sentiment Analysis, Recursive auto encoder, Semi-supervised learning
PDF Full Text Request
Related items