Font Size: a A A

Research On Malicious User Identification Of Weibo Based On Machine Learning Classification Algorithms

Posted on:2020-05-20Degree:MasterType:Thesis
Country:ChinaCandidate:L H ZhouFull Text:PDF
GTID:2428330590471777Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,various social applications have influenced people's daily life.Among them,weibo has become a popular platform for people to obtain information and share their feelings on a daily basis due to its rapid transmission and wide information coverage.While weibo brings convenience to people's social network,it is also full of many malicious users,who release junk information,influence the trend of public opinion,and bring trouble to people's normal use of weibo.In this thesis,malicious users on sina weibo are taken as the research object.By analyzing the behaviors and account information of malicious users on sina weibo,the feature set of identifying malicious users is set,and some machine learning classification algorithms are adopted to identify malicious users.The main works of this thesis are as follows:(1)Aiming at the limitation of collecting weibo users' data through the weibo interface,this thesis designs a crawler program,which can collect relevant data of weibo users according to the experimental requirements.Moreover,there is no need for frequent manual intervention and the website interface calls are not limited.From the actual running results of the program,the crawler program designed in this thesis can be used to collect the information of weibo users efficiently and accurately.(2)This thesis studies the machine users and advertising users on weibo.In order to improve the effect of identifying the two kinds of malicious users,this thesis analyzes the characteristics of malicious users by drawing the cumulative distribution map of features and number of users.In this thesis,three new features are proposed: the number of praise,personal introduction and the proportion of pages and recommendation attention,and they are added to the traditional feature set to set a new feature set.The k-nearest neighbor algorithm,naive bayes classification algorithm and decision tree algorithm in the machine learning classification algorithms are respectively used for experiments.The experimental results show that compared with the traditional feature set,using the new feature set proposed in this thesis,the accuracy rate of identifying machine users and advertising users is higher.(3)Aiming at the problem that the traditional naive bayesian classification algorithm(NBC)is not ideal for identifying malicious users,NBC algorithm is improved in this thesis.Considering the difference in importance of different features to classification results,based on the NBC algorithm,this thesis proposes a weighted naive bayes classification algorithm based on information gain(WNBCI),the algorithm constructs the feature ranking table and obtains the weights of each feature by calculating the information gain,experimental results show that this algorithm has higher accuracy than NBC algorithm.Considering that different values of the same feature have different effects on classification results,based on the WNBCI algorithm,a weighted naive bayesian classification algorithm based on information gain and gini index(WNBCIG)is proposed in this thesis,the algorithm combines the information gain of each feature and the gini index of different values to get the weight of each feature,experimental results show that this algorithm has higher accuracy than WNBCI algorithm.(4)In this thesis,a malicious user identification system for weibo is designed,which can be used to obtain hot topics and users' public information on weibo.In addition,the system uses the WNBCIG algorithm to identify malicious users on weibo,which provides convenience for researchers.
Keywords/Search Tags:malicious users, classification algorithms, naive bayes, feature weighting
PDF Full Text Request
Related items