Font Size: a A A

Research On Rumor Detection Of Weibo Based On Semi-supervised Learning

Posted on:2016-11-17Degree:MasterType:Thesis
Country:ChinaCandidate:T Q LuFull Text:PDF
GTID:2308330461984238Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As a product of high-tech information age, microblog develops rapidly. At the same time, rumors in microblog have been becoming an increasingly outstanding problem. As a premise of researching, monitoring, replying and managing to rumor, automatic detection of rumor is increasingly attracting more attention. At the same time, there are more and more researches about rumor detection on microblog. International and domestic academics have done a large of research on credibility of social network and microblog, especially Twitter. The main idea of those research is to train a classifier to detect rumor by extracting information profiles from user, text, propagation and so on. In microblog rumor detection, labeling micro-blog rumors correctly requires a huge amount of manpower and time. At the same time, imbalanced data category also affects the correct recognition of microblog rumors. Those problem cannot effectively be resolved with traditional machine learning.This thesis is based on the background of Sina Weibo, focus on microblog rumors, within the framework that other researcher took detection tasks as classification problem to settle. Concentrate on resolve the costly problem of labeling data of traditional supervised learning algorithm, introducing the semi-supervised learning algorithm into microblog rumor detection. Meanwhile, because the number of microblog rumor is far less than the non-rumor, accurately identify rumor have higher value than non-rumor, microblog rumor detection is defined as a problem of two-class classification from imbalanced data. On the basis of the above considerations, this thesis proposes a semi-supervised learning algorithm from imbalanced data sets.The work of thesis mainly reflects in the following two aspects. First of all, this thesis proposes an improved method based on Co-Forest algorithm, which can be used for imbalanced dataset. This method used SMOTE algorithm and stratified sampling to balance the data’s distribution. Moreover, it improved the correct rate of unlabeled sample through the cost-sensitive weighted voting method. Experiment results on 10 UCI data sets prove that the algorithm is effectively and efficiently. In addition, on the basis of research of imbalanced datasets classification problem, introducing machine learning methods of imbalanced data classification to microblog rumors detection field, giving a framework of microblog rumor detection. Finally, two groups’microblog rumors empirical experiments illustrate the effectiveness and superiority of the algorithm.Shown by the experiment of extracting data from Sina Microblog platform, the algorithm in the thesis can solve the problems including high cost of labeling microblog rumors and low detection accuracy rate resulted in imbalance of data category in the Microblog rumor detection. In conclusion, it is suitable for huge datasets microblog data analysis and rumors detection.
Keywords/Search Tags:Microblog, Rumor detection, Imbalanced Data, Semi-supervised learning
PDF Full Text Request
Related items