Font Size: a A A

Research On Spammer Detection Techniques In Social Networks

Posted on:2018-02-18Degree:MasterType:Thesis
Country:ChinaCandidate:J QiFull Text:PDF
GTID:2348330569986541Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Social networks have become important platforms for people to obtain,share and disseminate information with its prosperity.However,the social networks which have vast number of users have also attracted a lot of spammers with the purpose of profit,which has brought severely harm for legitimate users and social platforms.There are many kinds of user features in social networks,so how to select the appropriate features for spammer detection is one of the key problems.At the same time,the spammer detection technology is mainly based on machine learning algorithm.Although the unsupervised learning detection algorithm does not need labeled data,the accuracy is too low to meet the requirements of detection;supervised learning detection algorithm requires a large amount of labeled data and spammers usually change strategies to bypass the detection system,which leads to low efficiency.To address the problems above,the specific contents of this thesis are as follows:1.Aiming at the above problem of feature selection in spammer detection,a feature selection algorithm named CFR-GA which combines the comprehensive filter ranking(CFR)with genetic algorithm(GA)is proposed in this thesis,and the CFR-GA is applied in spammer detection algorithm.Firstly,the CFR algorithm based on filter is used to calculate the comprehensive scores of features which are sorted from large to small,and then the features which have lower rankings are deleted to reduce the search range of GA;secondly,the comprehensive scores are used to guide GA to initialize the population which can improve the running efficiency of GA;finally,GA is utilized to search the optimal feature subset.The experiment results show that the feature subset obtained by CFR-GA has smaller dimensions and better classification performance.And compared with GA,CFR-GA has higher efficiency.2.Aiming at the above problem of labelling data manually in spammer detection,this thesis proposes a novel spammer detection algorithm based on ordering points to identify the clustering structure(OPTICS)and support vector machine(SVM)which is named OSHCM.Firstly,the OPTICS algorithm is used to generate clusters and thus the initial class labels of data are obtained;secondly,according to the denseness of samples obtained from clusters,some reliable learning samples are selected;thirdly,a feature subset is generated by CFR-GA;finally,the training samples and feature subset are used to train SVM classifier,and then the trained SVM classifier is used to reclassify the original data.The experiment results show that the detection result of the algorithm is close to SVM and achieves great improvement than OPTICS without labeled dataset.
Keywords/Search Tags:social network, spammer detection, feature selection, machine learning
PDF Full Text Request
Related items