Font Size: a A A

Research On Spammer Detection Methods In Online Forums

Posted on:2019-07-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:G R ChenFull Text:PDF
GTID:1368330623453332Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The spammers and their ‘flood' posting behaviors harm the network information security seriously,leading to a series of problems to cyberspace,such as false proliferation,seriously polarization tendency of the netizen group,public opinions maliciously manipulated by domestic and foreign unlawfully,infiltrating invasion of foreign hostile forces,damaging the interests of the majority of internet users,endangering the credibility of the government,impacting the harmony and stability of our social.Goven spammers is imminent.In recent years,detection of spammers has aroused extensive attention of scholars at home and abroad.However,the existing researches mainly concentrate in the field of e-mail,social networking and e-commerce sites,and the researches on network forum spammer detection are still rare.This paper targets for the network forum spammer detection,starts from the analysis of network public opinion evolution,network forum information organization and network forum user behavior characteristics,and studies network forum spammer detection methods.The main contributions of this paper are as follows:This paper presents a network public opinion evolution and spammer influence model with dual selection mechanism based on influence and trust threshold,and with high impact and influence first.In view of the fact that the number of network forum users is huge,the time and energy of internet users is limited,users can not and would not reference the views of all the internet users,but tent to reference the opinions of those who has bigger influence and small opinion distance,model the internet users' opinion interact strategy,determine the three impacts which determine the evolution of network public opinion and results of that.On this basis,we established of the spammers' influence model,analysed the impact of spammers to the process and results of public opinion evalution by adding spammers in the network.The simulation results show that,the spammers will affect the evolution direction of the network of public opinion,the convergence speed of publib opinion is positively related to the quantity of spammers,which reveals the propagation mechanism of the spammers,and provides a theoretical basis for the research on the detection method of spammers.This paper reveals the law of the online user behavior,presents a spammer detection algorithm based on statistical analysis and “co-post” behavior analysis.The empirical statistical analysis founds that,many statistical indicators to users,posts and online forums follow power-law distribution.Spammer activities hiden behind a few active users and main posts,we should exclude the normal user and posts as early as we can and reducing the caculate rang.Based on this idea,to build a "three step" network forum spammer detection algorithm,we exclude normal users and data for three times and approach spammers gradually: in the first step,we analyze the abnormal indexes of network forum and exclude the time span when spam activities can't happen,reduing the calculating rang for the first time;in the second step,we construct co-post network,caculate the conspiracy of users,and find out highly suspicious users according to the pruned co-post network,reducing the calculation rang for the second time;finally,determine spammers according to the time characteristics.The experimental results show that,the algorithm can effectively identify spammers who are grouped and reply to the same group of posts,with high speed and high accuracy.This paper proposes a user behavior similarity index and based on this index designs and implements a spammers detection algorithm.In view of the fact that spammers using posting robots controlling a large number of spam accounts,postint towards a single post,we propose a novel user behavior similarity index,describe the similarity of reply behaviors of a user from 3 aspacts.On this basis,we design a spammers detection algorithm,using the "divide and conquer,parallel processing" strategy,analyze multiple discussion threads synchronously,compute each pair's similarity in a thread,construct user similarity network,find out spammers by clustering the pruned netword.The experimental results show that this algorithm can effectively detect spammers who reply to the same root post repeatedly.Meanwhile,the algorithm has good parallelism and scalability,and can solve the large-scale data set of the spammers detection problem.This paper proposes a suspicious degree transfer model,based on this core idea designs and implements a semi supervised spammers detection algorithm.We regard the user-post network as a Markov random field,based on the "good guys tend to reply the good discution threads,spammers tend to target the spam ones",reply on this fact,use Markov network theory and belief propagation model,and put forward a kind of node suspicious degree propagation model.On this basis,we design and implement a kind of spammers detection algorithm,taking user-post network,the original suspition of nodes and transfer matrix as input,determine the type of the nodes by iterative calculation.The algorithm is a semi supervised learning method.It can learn from a small amount of annotated data and a large amount of unannotated data.Tagging data can be the results of other algorithms or artificial annotation.In the absence of prior knowledge,the algorithm can also run independently.In order to reduce the dependence of the algorithm on the knowledge of the outside world,improve the speed of convergence of the detection performance and the iterative algorithm,a group of features of users and posts are propoed and which are used to caculate the original suspicion of the noeds.The experimental results show that,the performance of the algorithm is better than the support vector machine with the same group characteristics.
Keywords/Search Tags:Network forum, Spammer detection, User behavior analysis, User behavior similarity calculation, Semi-supervised learning
PDF Full Text Request
Related items