Font Size: a A A

Research On Commentary Garbage Identification Method Based On Human Dynamics

Posted on:2018-06-17Degree:MasterType:Thesis
Country:ChinaCandidate:T LiuFull Text:PDF
GTID:2358330515953945Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the e-commerce,mobile Internet and online social media continue to emerge,people can shop,make friends and entertainment through the Internet,the Internet has become an important part of the public lives.The comment function of these platforms allows users to freely express their views which makes users become information contributors of internet from information acquirer gradually,the network world was glutted with user-generated content.Spam hiding in these content is seriously affecting people's daily lives,how to identity those spam automatically and efficiently by using computer is a long-standing challenge,but also one of the hot issues in text mining and natural language processing.Based on the existing research work and the demand of Internet public opinion analysis,this paper takes the news comment and user data of Netease news portal as the research object,and puts forward the method of spam identification based on human dynamics.In the process of method research,this paper extracts the sample feature space for model construction from the point of users and the users' comments.With regard to extracting the features of the users,first of all,we analysis the behavior's characteristics of normal users and the spammer,and calculated the personal behavior rules according to the analysis results,which including the basic behavior data of the users such as comments,replies,favorites and feed count,the number of daily comments,etc.;and the user's comments release behavior rules,such as the commentary published time interval mean,variance and so on.In addition,this paper analyzes the four kinds of interactive behaviors of users:reply,follow,comment on the same news and publish similar comments.Based on the established network model,six kinds of network topological feature calculation methods are used to extract the interactive features of the users.Finally,this paper calculates the IV value of the comment text,and constructs the feature space of the comment according to the related attributes.Based on the features of the users and the users' comments,four sets of experiments were designed.We used GBDT and SVM machine learning algorithms to model the different feature subsets,by comparing and analyzing the final experimental results,we got the optimal feature subset for the method of spam identification.The experimental results show that the method based on the rules of human dynamics can effectively identify the spammer in the network platform,especially on the spammer who behaves the machine behavior.Otherwise,the inclusion of user behavior characteristics makes the model identify the reviews spam with high precision and recall.
Keywords/Search Tags:Spam Review Recognition, Spammer Detection, Human Dynamics, Network Science, GBDT
PDF Full Text Request
Related items