Font Size: a A A

Social Spammer Detection Based On Co-training

Posted on:2017-05-25Degree:MasterType:Thesis
Country:ChinaCandidate:H J BaiFull Text:PDF
GTID:2348330533950981Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In recent years, with the development and maturity of web 2.0 technology, as a communication tool of human society for communicating and exchanging, the social network has brought great convenience for people. However, a large number of social spams and spammers seriously affect the communicating between people. They will not only consume a large amount of network resources, but also have the potential damage to the rights of legitimate users. Existing spam and spammer detection techniques are usually based on a large number of labeled data, using supervised learning strategy. However, manually labeling dataset is a complex and error-prone work, but also consumes a lot of manpower and material resources. Therefore, it is necessary to study how to use less labeled data to detect spams and spammers.In order to solve the above problems, this paper proposes a semi supervised classification framework to detect spammers in social networks. This framework coordinates co-training and clustering algorithm. First of all, we identify and label some informative and representative samples by k-medoids clustering algorithm as the initial seed set for semi-supervised learning, and then we use the content and behavior characteristics of users for co-training. This classification framework continuously predicts users' marks, and then chooses the users who have high degree of confidence and meet a certain threshold for re-training model, an optimized classification model is obtained through successive iterations.This paper first introduces the harm of social spams and the necessity of detecting social spammers. Then, it summarizes the technologies of detecting social spams and the related theories. Then it elaborates the algorithm and the implementation of the semi-supervised classification detection framework based on co-training. Finally, the experiments on a real Twitter dataset verify the effectiveness and correctness of the framework. The proposed framework could train the model correctly and achieve good test results based on a few labeled samples.
Keywords/Search Tags:Social Spams, Semi-supervised Learning, Co-training, K-medoids
PDF Full Text Request
Related items