A Method Of Indentifying Microblog Spammers Based On Support Vector Machine

Posted on:2014-09-05

Degree:Master

Type:Thesis

Country:China

Candidate:X Chen

Full Text:PDF

GTID:2268330401479415

Subject:Computer application technology

Abstract/Summary:

Recently microblogging becomes the most popular online social communication.However, raw garbage users that come out along with online social network seriouslyaffect the current user microblogging experience.And itâ€™s an regular method to use machine learning algorithm for identifyingspammers based on features-extraction, and how-to-choose and how-to-extractfeatures will play an important role for the accuracy. However currently the mainstudy object is English microblog like Twitter. In view of short of study for that,thispaper make depth study using machine leaning. Besides that,this paper also make adetailed analysis and application on relevant learning algorithm, and design andachieve a simple feasible garbage recognition system. Our work is below:1) Data acquisition. Our experiment mainly use java sdk of Sina microblog openAPI for data acquisition. Using relevant interfaces, we get data for experiment. Andwe use several tokens to get more data.2) Data preprocess and features extraction. This paper proposed a new methodusing Chinese text similarity based on VSM, long and short URL similarity, andposting regulations et al to achieve higher precision. This method firstly extracteduseful status content and user information from previously obtained data, and thentransform it to into vector as the input for classifier. Lastly using LibSVM tools wecan get the classifying model.3) Classify. For the new sample, after the preprocess and feature extraction ofmicroblog content and user information, we can take advantage of the modelpreviously got to judge the sample whether a spammer.4) System construction. This paper describe the system from construction andworkflow in detail. The system composed of several model is implemented using javaand is able to obtain user-relevant data, construct classifying model and make ajudgment.The experiment results show that this creative method is of great effect forspammersâ€™ recognition. Although our experiment is conducted based on Sinamicroblog data, it has generality for Chinese microblog.

Keywords/Search Tags:

Chinese Microblog, Spammer, Support Vector Machine, features, classify

Related items

1	The Research Of Support Vector Machine Based On Fuzzy Clustering In Classify Algorithm
2	Combinating Of Rules And Statistics For New Words Detection Of Microblog Text
3	A Chinese OCR System Based On Gabor Features And SVM
4	Research On Some Problesm Of Support Vector Machine Learing Algorithm
5	The Research And Implementation On The Technology Of Spammer Detection For Sina Mircoblog
6	The Method To Identify Spammers In Microblog
7	Research On Support Vector Techniques And Their Applications
8	Research On Chinese Text Categorization Based On The Integrated Support Vector Machine Method
9	Model And Detection On Microblogging Spammer Behavior Based On Microblogging Data
10	Research On The Detecting Of Spammers In The Microblog Network