Font Size: a A A

Behavior Analysis And Detection Of Micro-Blog Spammers

Posted on:2017-04-17Degree:MasterType:Thesis
Country:ChinaCandidate:Z C SunFull Text:PDF
GTID:2308330485476117Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the increasing development of the Internet, more and more organizations or individuals begin to obtain information through social networks. The user community of social networks is huge, and user relations are usually based on some kind of social relationships, such as students, friends or relatives, therefore information in the dissemination process is more likely to be accepted. Consequently, lots of users with evil intention are resorting to cheat and obtain improper benefits, through the release of viruses, violence, pornography and other harmful links or micro-blogs. However, with the continuous improvement of anti-cheating technology and the continuous self-improvement of social network system, this kind of cheating behavior has been almost no longer exists, which becomes more and more hidden, less harmful and relatively more indirect to normal users. This can be reflected in the influence on information absorbance efficiency of social network users. This thesis, from the perspective of impact scale and impact initiative, defines different types of spam Micro-blog users, and takes up research and analysis on various types of micro-blog users conduct.This thesis designs a large-scale parallel micro-blog crawler which crawls over 5 million micro-blogs, and also pays efforts to expand the original data set. Meanwhile, from four aspects, namely user’s personal information, user behavior, user relationship and user’s micro-blog text, pre-processing on the new data set is implemented, features are afterwards further extracted, and successively, a Chinese micro-blog sample set containing typical micro-blog spam users like passive marketing users of hand-phone advertiser and passive propaganda users of stars and sports events is constructed. This thesis also pays efforts on segmentation and theme creation micro-blog content, and therefore constructs the thesaurus based on micro-blog content. On the basis of work above and data set balancing, the contribution of different feature combinations are compared through the comparison experiment, and the feature combination with best optimized classification detection effect is finally selected. Then, this thesis compares the feature difference of each pairwise subclasses, and accordingly devises a multivariate SVM classification algorithm to do data set classification. Performance comparison is made between the employed multivariate SVM classification algorithm and other classification algorithms, and the results show that the employed algorithm has advantages over others. At last, this thesis designs a multivariate SVM classification algorithm based on synthesized weight. It, in arbitrary two sub classes, works out the number of samples that are divided into one another class, calculates the weight of each sub classifier, which is add to the objective function for the purpose of classification, and hence improves classification accuracy.
Keywords/Search Tags:SVM, weight, multiple classification, weibo benchmark, feature selection
PDF Full Text Request
Related items