Font Size: a A A

Research And Implementation On Public Opinion Classification Of Microblogging Based On Hadoop

Posted on:2017-06-05Degree:MasterType:Thesis
Country:ChinaCandidate:C J HuFull Text:PDF
GTID:2348330518495897Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
With the continuous development of newly social media such as Microblogging and an increasingly large number of users,huge amounts of new information have been produced on Microblogging and other platforms every day.It is critical that we should know and take a close attention on public opinion,this can promote social harmony and stability.It is also of practical importance that we grasp the social public opinion dynamics.This thesis aimed at the existing problems of information mining and analysis in public opinion on microblogging.In the study,text topic model LDA(Latent Dirichlet Allocation),text categorization algorithm and machine learning algorithms have been applied to public opinion events mining and emotion classification of important public events.The main research contents and innovation points include:1.Aiming at the problem of unbalanced training datasets which is caused by the small proportion of target public opinion events,a variety of unbalanced sampling method have been proposed in the terms of data,which is based on the text topic model LDA and multiple support vector machines SVM.It can reduce the negative effects on the subsequent algorithm.In the terms of ensemble learning,a method that combines multiple SVM models and aggregate the result of them has been proposed in order to reduce the text classification error which is caused by the single classification model,it can also improve the performance of the recognition of target public events.In the study,the Microblogging data have been used to validate the aggregated model,it turns out to be superior to the single support vector machine(SVM)model in terms of performance.In addition,this semi-supervised aggregated model can make full use of a large number of unmarked samples to improve the performance of the classifier,and reduce the effort of manual annotation to some extent.2.A mixed emotion classification algorithm which combines the unsupervised clustering algorithm and the supervised learning algorithm has been proposed.Apart from comparing the frequently used supervised algorithm such as:decision tree,random forest,this paper presents a combination of unsupervised clustering algorithm K-means and supervised learning algorithm random forests,which can be used to the emotion classification.In binary emotion classification,this hybrid algorithm has a 1%improvement in accuracy compared to the commonly used emotion classification algorithm.In order to prove that the hybrid algorithm has good scalability,this thesis has carried on the fine-grained classification of emotion based on binary classification.As can be seen from the result,when the number of clustering has achieved the optimal conditions,the hybrid algorithm has a 2%increase than the traditional classification model.3.A Microblogging public opinion analysis system based on Hadoop platform has been proposed in the thesis.It can identify major public opinion event aiming at the features of Microblogging data such as sparse and fragmentation.What's more,this system can produce emotional analysis report to a certain public opinion event,it is of great convenience of understanding public attitudes,grasping the development situation of public opinion,and making decisions of public opinion events.
Keywords/Search Tags:Microblogging, public opinion, text classification, emotion classification, support vector machine(SVM)
PDF Full Text Request
Related items