Font Size: a A A

A Method Of Indentifying Microblog Spammers Based On Support Vector Machine

Posted on:2014-09-05Degree:MasterType:Thesis
Country:ChinaCandidate:X ChenFull Text:PDF
GTID:2268330401479415Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Recently microblogging becomes the most popular online social communication.However, raw garbage users that come out along with online social network seriouslyaffect the current user microblogging experience.And it’s an regular method to use machine learning algorithm for identifyingspammers based on features-extraction, and how-to-choose and how-to-extractfeatures will play an important role for the accuracy. However currently the mainstudy object is English microblog like Twitter. In view of short of study for that,thispaper make depth study using machine leaning. Besides that,this paper also make adetailed analysis and application on relevant learning algorithm, and design andachieve a simple feasible garbage recognition system. Our work is below:1) Data acquisition. Our experiment mainly use java sdk of Sina microblog openAPI for data acquisition. Using relevant interfaces, we get data for experiment. Andwe use several tokens to get more data.2) Data preprocess and features extraction. This paper proposed a new methodusing Chinese text similarity based on VSM, long and short URL similarity, andposting regulations et al to achieve higher precision. This method firstly extracteduseful status content and user information from previously obtained data, and thentransform it to into vector as the input for classifier. Lastly using LibSVM tools wecan get the classifying model.3) Classify. For the new sample, after the preprocess and feature extraction ofmicroblog content and user information, we can take advantage of the modelpreviously got to judge the sample whether a spammer.4) System construction. This paper describe the system from construction andworkflow in detail. The system composed of several model is implemented using javaand is able to obtain user-relevant data, construct classifying model and make ajudgment.The experiment results show that this creative method is of great effect forspammers’ recognition. Although our experiment is conducted based on Sinamicroblog data, it has generality for Chinese microblog.
Keywords/Search Tags:Chinese Microblog, Spammer, Support Vector Machine, features, classify
PDF Full Text Request
Related items