| There are massive amounts of heterogeneous user-generated content on the Internet.Including user reviews of certain services or products on public review websites and electronic trading platforms.In general,users share their consumption experience in various ways,such as uploading image data like photos,posting text comments,and giving ratings within the platform limits.With the continuous in-depth research of personalized recommendation,user profile,text mining and analysis,these huge user feedback data have become more and more indispensable research support.To a certain extent,online reviews will influence or even guide other new users’ shopping decisions,which are closely related to brand reputation and business economic benefits.In order to obtain gray income,certain groups or individuals post deceptive comments to attack business competitors or promote their own image,so filtering fraud comments has become a key part.Traditional machine learning based fake review detection methods rely on discrete spam features manually designed from experts experience.Most of them view the fake review detection as the binary text classification task.Although they can achieve good classification accuracy,costly feature engineering work generally cannot be avoided.Probabilistic graphical model based methods usually model the original review data as a topological network,and explore the probability dependence between entities in the review system,transforming fake review detection into probabilistic reasoning and ranking on the graph.Graph Neural Networks(GNNs)can maintain network structure similarity and node attribute information at the same time.Therefore,some studies have combined user reviews to build end-to-end fraud detection methods based on GNNs.However,fraud reviews account for only a minority of the overall large-scale user comments data.The class imbalance problem,as a key factor affecting detection performance,needs further research in GNN-based fake review recognition methods.Ensemble learning,data re-sampling and loss function engineering are powerful tools to solve the class imbalance problem in deep learning.Therefore,in response to the data category tilt problem faced by the GNN-based fraud review detection models,an ensemble hierarchical graph attention network based fake review identification method is proposed.The main work of this paper is as follows:(1)The domestic and foreign related work of deceptive comments identification is studied,some common features of fraud reviews are summarized,and the characteristics of these fake review detection methods are analyzed from the perspectives of learning methods and feature information used.Some mainstream technologies to solve the problem of category imbalance classification are summarized,and some researches focusing on solving the category skew distribution problem in fraud review identification are summarized.(2)Aiming at the data category imbalance problem,an ensemble hierarchical graph attention network(En-HGAN)fake review detection method is proposed.This method models the review system as a multi-view network,which contains multiple composite relationships between reviews from different perspectives,and uses the hierarchical graph attention network(HGAN)to learn more accurate vector representations for reviews;Random sub-sampling strategies and Bagging framework are fused,then multiple differentiated HGAN sub-models are integrated based on input sample disturbances to enhance the model generalization performance as well as reducing helpful information loss.The experimental results on real unbalanced comment data show that the En-HGAN method has good detection performance.(3)Under the skewed data category distribution,in order not to seriously damage the detection performance and reduce the cost of model training,the SEn-HGAN fake review recognition method is proposed by combining the idea of snapshot ensemble and the unbalanced Focal Loss function.During the entire training process of HGAN,Focal Loss is used to guide its learning direction,the learning rate is periodically adjusted based on the cyclic cosine annealing method,and the model parameter perturbation is used to integrate multiple HGAN sub-models that converge to different local optimums.Compared with the En-HGAN method,the experimental results on the real unbalanced fake comment data set demonstrate its balance between good recognition performance and appropriate training cost. |