Font Size: a A A

Research On Method Of Malicious Weibo User Identification

Posted on:2018-12-03Degree:MasterType:Thesis
Country:ChinaCandidate:Z H LiFull Text:PDF
GTID:2348330512495174Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the internet,social networks,such as Twitter and Facebook,have also gotten dramatic progress.Social networks have become inevitable part of modern people's life.In China,Weibo is the most popular social network application.It has already surpassed the pure social contact but instead become an information diffusion center.Meanwhile,it is influencing people's opinions.Therefore,the research against malicious users has an important practical significance.Identification technology for malicious users is an important research hotspot.This thesis aims to make research on the problem of identifying malicious users in Weibo.The work of the thesis is partly supported by the National Natural Science Foundation of China(No.61271308?61172072?61401015)and Academic Discipline and Postgraduate Education Project of Beijing Municipal Commission of Education.The thesis revolves around the following issues:Based on the features of malicious users,this thesis analyzes and discovers the differences between malicious users and normal users by using a function of"collection" which considers Weibo's functional features and users' habits.And then,"collection quantity" and "collection speed" are added into the feature list to verify their contributions to identifying malicious users.The calculation methods of Weka and parameter adjusting are employed in this thesis.To solve the problem of users' information loss,the classification effects before and after processing losing data are compared respectively through three methods including Naive Bayesian,C4.5 Decision-making Tree and Random Forest.Comparing results show that,when data losing exists,both C4.5 Decision-making Tree and Random Forest have good robustness,especially the later.The thesis also simulates the practical condition of how to increase the identifying efficiency in large-scale data.Through the implementation of proposed methods in Hadoop platform,the processing time of data sets in different sizes by different numbers of nodes and the identification effects of malicious users are respectively compared.In summary,the thesis analyzes the difference between malicious users and normal users from the perspective of users' features,based on which suitable classification calculating methods to identify malicious users are chosen.Calculating results indicate that the identification accuracy rate reaches about 90%.
Keywords/Search Tags:Weibo, malicious users, machine learning, random forest, Hadoop
PDF Full Text Request
Related items