Font Size: a A A

Research On Abnormal User Detection Based On Unbalanced Data In Social Networks

Posted on:2022-08-02Degree:MasterType:Thesis
Country:ChinaCandidate:L HanFull Text:PDF
GTID:2518306530977119Subject:Information management and information systems
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,social networks have become an important social tool in people's daily lives.However,abnormal users in social networks emerge in endlessly,and their harm is becoming more and more serious.Therefore,more accurate and effective identification of abnormal users in these social networks plays a very important role in protecting users' own rights,building a healthy and stable network environment,and maintaining a good social order.This paper firstly incorporates user behavior characteristics and published content information into the attribute list while being innovative.The principal component analysis shows that the selected attribute can capture the main contradiction of judging whether the user is an abnormal user.Secondly,the collected data set Imbalance is the focus of the research.A hybrid sampling method based on Hellinger Distance is used to balance the data.Finally,the decision tree C4.5algorithm and the SMO of the support vector machine,which have excellent performance in handling two classification problems,are selected.The algorithm classifies and predicts the data sets before and after the balance.The results show that the prediction accuracy of the decision tree C4.5 model of the mixed sampling balanced data set based on the Hellinger Distance is improved by 5.8% compared with the mixed sampling,and the accuracy is increased by 9.5%.,The recall rate increased by 3.2%,the F1 Score increased by 6.7%,the support vector machine SMO algorithm model prediction accuracy rate increased by 1% compared with the mixed sampling,the recall rate increased by 2.1%,the F1 Score increased by 0.9%,and the balance The accuracy of classification and prediction of the data set using the decision tree C4.5 algorithm has increased by 8.5%,and the accuracy has been increased by 4.7%.After the balanced data set,the accuracy of classification and prediction using the SMO algorithm of the support vector machine has increased by 2.6%.The accuracy rate has increased by 0.6%.The experimental results show that whether it is using the C4.5 decision tree algorithm or the SMO algorithm of the support vector machine,the prediction accuracy and effect when processing balanced data sets are significantly higher than the accuracy and effect of processing unbalanced data sets,and the mixed sampling unbalanced data preprocessing method based on the Hellinger Distance is better than the mixed sampling method.
Keywords/Search Tags:social network, abnormal user classification detection, hybrid sample based on Hellinger distance, decision tree C4.5 algorithm, support vector machine SMO algorithm
PDF Full Text Request
Related items