Social Spammer Detection Based On Co-training

Posted on:2017-05-25

Degree:Master

Type:Thesis

Country:China

Candidate:H J Bai

Full Text:PDF

GTID:2348330533950981

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

In recent years, with the development and maturity of web 2.0 technology, as a communication tool of human society for communicating and exchanging, the social network has brought great convenience for people. However, a large number of social spams and spammers seriously affect the communicating between people. They will not only consume a large amount of network resources, but also have the potential damage to the rights of legitimate users. Existing spam and spammer detection techniques are usually based on a large number of labeled data, using supervised learning strategy. However, manually labeling dataset is a complex and error-prone work, but also consumes a lot of manpower and material resources. Therefore, it is necessary to study how to use less labeled data to detect spams and spammers.In order to solve the above problems, this paper proposes a semi supervised classification framework to detect spammers in social networks. This framework coordinates co-training and clustering algorithm. First of all, we identify and label some informative and representative samples by k-medoids clustering algorithm as the initial seed set for semi-supervised learning, and then we use the content and behavior characteristics of users for co-training. This classification framework continuously predicts users' marks, and then chooses the users who have high degree of confidence and meet a certain threshold for re-training model, an optimized classification model is obtained through successive iterations.This paper first introduces the harm of social spams and the necessity of detecting social spammers. Then, it summarizes the technologies of detecting social spams and the related theories. Then it elaborates the algorithm and the implementation of the semi-supervised classification detection framework based on co-training. Finally, the experiments on a real Twitter dataset verify the effectiveness and correctness of the framework. The proposed framework could train the model correctly and achieve good test results based on a few labeled samples.

Keywords/Search Tags:

Social Spams, Semi-supervised Learning, Co-training, K-medoids

PDF Full Text Request

Related items

1	Research On Semi-supervised Self-training Method
2	Research On Semi-supervised Learning Classification Algorithm
3	Research On Transfer Learning Algorithm Based On Semi-supervised Tri-training
4	Study On Semi-supervised Recommendation Method Based On Co-training
5	Research On Semi-supervised Learning Algorithm Based On Tri-training Algorithm
6	The Research On Semi-supervised Collaboration-training Algorithm
7	Research Of Reliable Semi-supervised Classification
8	Object Classification Based On Semi-supervised Learning
9	Comparison And Improvement Of Two Methods Based On Semi-Supervised Learning
10	Comparison And Improvement Of Two Methods Based On Semi-supervised Learning