Font Size: a A A

Research On Review Spam Detection Based On Graph Neural Networks

Posted on:2022-04-03Degree:MasterType:Thesis
Country:ChinaCandidate:J M HuangFull Text:PDF
GTID:2518306557468834Subject:Computer technology
Abstract/Summary:PDF Full Text Request
As the cyberspace continues to expand,the reliability and authenticity of the information disseminated in it has become more and more important,especially in e-commerce,because potential consumers will first check online reviews before making a purchase decision.These comments can be easily obtained through related websites,but the lack of verification of their authenticity has caused people to worry about their reliability.In addition,some users misled other users to purchase the target product by posting false comments,causing certain economic losses.In order to maintain the economic order of cyberspace,it is of great practical significance to be able to effectively detect unreliable and false comments.Based on the extraction of linguistic features and behavioral features from comment data,this thesis uses graph neural network to extract correlation features between comments,and combines under-sampling and random forest methods to classify the extracted features.The specific research content is as follows:1.Analyze and combine the entity attributes of the real-world website Yelp data set to construct a feature set that can effectively identify false comments.This feature set starts from two aspects of comment behavior and comment text.First,10 behavior features are constructed in the dimensions of time,number of comments,ratings,and interactions,and then 2 features are constructed in terms of the number of words and similarity of the text.,These features can portray false reviews from different dimensions,and experiments to verify the importance of these features.2.The Easy Ensemble-RF algorithm is proposed to realize the detection of false comments.This thesis analyzes the characteristics of the base classifier in Easy Ensemble,uses random forest to replace its base classifier,so that it can fight against the interference of abnormal data in minority classes,and uses undersampling to process the data and then uses the random forest as the basis to determine the classification model results.Experiments show that it effectively improves the indicators of the original model.3.An improved graph neural network algorithm based on Trust Rank is proposed to find as many false comments as possible.This thesis proposes the concept of suspicious value to quantify the degree of falsehood of comment nodes in the comment network,and uses the proposed features to initialize the suspicious value of the node,and then uses Trust Rank to continuously propagate the suspicious value in the network until the set number of times is reached,and then divided according to the suspicious value Four sampling strategies are used to generate node neighborhoods,and adaptive sampling methods are used to enrich neighborhood information,resist noise interference,and improve neighborhood quality.Finally,the neighborhood information is aggregated to generate node embedding and spliced and fed with the paragraph embedding extracted by Doc2 vec Training and prediction in Easy Ensemble-RF.Through experimental analysis,the proposed method has a hight recall.
Keywords/Search Tags:Review Spam, EasyEnsemble, Graph Neural Networks, Document and Node Embeddings, Feature Engineering
PDF Full Text Request
Related items