Font Size: a A A

Research On Micro-blog Spammer Identification Based On Feature Selection

Posted on:2024-03-22Degree:MasterType:Thesis
Country:ChinaCandidate:X D WuFull Text:PDF
GTID:2558306914969929Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,with the increasing development of the Internet,social media has gradually highlighted its advantages.The network media represented by Twitter and Sina Weibo provide novel ways of social communication.Due to its convenience,while gaining huge attention,Weibo is also flooded with a large number of spammers,whose existence degrades the overall quality of the platform.Therefore,how to accurately identify the spammers has become a hot topic in micro-blog research.Traditional identification methods are carried out by artificial filtering,etc.,but these methods are difficult to achieve their effect in the face of large-scale spammers.Therefore,many researchers apply machine learning algorithm to the field of spammers identification.However,with the increase of data,the recognition effect of machine learning algorithm is not good.For large-scale data,feature analysis is very necessary,and feature selection is an effective algorithm.However,at present,there are not too many scholars to study the relevant aspects of micro-blog spammers identification field.The main problem faced by micro-blog spammers identification is the poor effect of spammers identification model,and feature selection algorithm can effectively reduce the data dimension and improve the accuracy of spammers identification.To solve this problem,it is necessary to propose a model method for the identification of the spammers,which can quickly discover the spammers in social media,so as to realize the effective identification of the spammers on the micro-blog platform.In order to solve the above problems,starting from the aspects that have not been involved in the current research and the existing shortcomings,this paper proposes a random forest-based Weibo spammers identification model,namely RF-RF model,so as to realize the identification of Weibo social platform spammers.The main work contents are as follows:(1)A random forest-based recognition model of micro-blog spammers,namely RF-RF model,is proposed.Compared with the traditional recognition methods,the RF-RF model combines the feature selection algorithm and classification learning algorithm to study the classification of spammers.In the study of micro-blog spammers identification problem,the random forest measure feature importance method was added to screen the features,and then the random forest classification algorithm was used to train the model,and the RF-RF model was constructed.(2)In order to verify the effectiveness of the proposed RF-RF model,the model was tested on the real Twitter data set,and a variety of classification indicators were used to evaluate the model.The specific process is: first,the feature set is established.Secondly,the random forest feature selection algorithm is compared with other algorithms such as manual feature selection,correlation test and recursive feature elimination,and the accuracy,recall rate and F1 value are evaluated.Then,comparative verification was conducted on multiple data sets,and AUC values were used to evaluate the model.The results verified the effectiveness of the feature selection algorithm based on random forest,and the corresponding RF-RF model showed more stable performance when using this method to extract features.Finally,the features extracted by this method and the original features are trained on classification algorithms such as SVM,KNN and RF.The experimental results verify the superiority and effectiveness of the proposed random forest-based micro-blog spammers recognition model,namely the RF-RF model.
Keywords/Search Tags:Feature Selection, Spammer Identification, Random Forest, Micro-blog, Social Media
PDF Full Text Request
Related items