Font Size: a A A

Online News Comment User Behavior Analysis And Spammer Identification

Posted on:2020-04-05Degree:MasterType:Thesis
Country:ChinaCandidate:W SunFull Text:PDF
GTID:2428330605466657Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,the Internet information has entered the era of big data.As the main way of information dissemination,Internet media has also undergone tremendous changes.Online news,social networks,forums,and so on have gradually become the main media for information dissemination and user communication.However,the potential business opportunities behind big data information have stimulated the emergence of a large number of spammers.They publish false speech,advertisements,illegal contents,and phishing websites on the media to gain commercial benefits,seriously damaging the experience of normal users.Therefore,in order to reduce the harm of false information,the research on the identification method of spammers has been carried out extensively.At present,the analysis of online news user comment behaviors is less than the analysis of social network user behaviors,and it is unable to obtain potential distribution rules.At the same time,the research on the identification method of spammers is mostly concentrated in the field of social networks,while less research is carried out in the field of online news.Moreover,the traditional methods of identifying online news comment spammers involve high data costs and poor effects,which need to be improved.This dissertation takes online news comment users as the research object,and collects online news and comments by using web crawler technology to study the distribution rule of user comment behaviors.At the same time,this dissertation extracts a set of behavioral and semantic features from the user comment behaviors and comment contents,and uses machine learning technology to identify unknown users.The main research contents of this dissertation are as follows:(1)An analysis is conducted to analyze the distribution features of online news user comment behavior.Firstly,a web crawler system is developed to collect online news and comments for specific online news website.Then the user,temporal and news distribution features of online news user comment behaviors are analyzed through empirical research method.It is found that a small number of users account for the vast majority of news comments.At the same time,the user comment behaviors have different features in the temporal dimension as the active degrees of users are different.Moreover,the user comment behaviors with different active degrees also have certain differences in different hot news.(2)A method for identifying online news comment spammers based on the label propagation algorithm is proposed.First of all,according to the comment behaviors of online news users,a set of behavioral and semantic features are continued to be analyzed,extracted,quantified,and normalized to obtain corresponding feature values.Then a spammer identification method is constructed based on the two-category label propagation algorithm.Finally,the set of feature values is input into the label propagation algorithm in different combinations,and experiments and evaluations are carried out to determine the most effective combination of features and improve the identification method.Experiments show that the Precision,F-measure and Accuracy of the identification method are improved compared with the traditional online news comment spammer identification method.(3)An improved method for identifying spammers based on the ensemble learning is proposed.Firstly,based on Bagging and simple averaging method,an ensemble learning algorithm based on semi-supervised classifier is built by integrating the label propagation algorithm.Then,based on Bagging and weighted voting method,an ensemble learning algorithm based on supervised classifiers is built by integrating a number of different supervised classifiers.Finally,based on Stacking method,the ensemble learning algorithm based on semi-supervised classifier is combined with the ensemble learning algorithm based on supervised classifiers to improve the online news comment spammer identification method.Experiments show that the Precision,Fmeasure and Accuracy of the identification method are improved compared with the single supervised learning classification algorithm.
Keywords/Search Tags:Online News, User Behavior Analysis, Spammer Identification, Label Propagation Algorithm, Ensemble Learning
PDF Full Text Request
Related items