Deep learning utilizes deep neural networks to learn representation of samples and then solves downstream tasks.It has achieved great success in many fields,e.g.,object detection,machine translation and self-driving.However,lots of data exists in the form of graph.There is more learning tasks requiring the ability to handle such complex data.Conventional deep learning methods(e.g.,convolutional neural network,CNN)cannot be directly applied on graph data.Driven by taking advantage of powerful representation of deep learning methods,graph neural network(GNN)is well developed in recent years.In the age of information,things are moving towards cyberspace.Internet has been inextricably interwoven with modern society.And many frauds expand from offline to online.Maintaining order of this virtual world has turn into an urgent problem.Fraud detection is an anomaly detection problem.And GNN has been very successful at solving this kind of task.Recently,Camouflage-Resistant GNN(CARE-GNN)is proposed and achieves state-of-the-art results on fraud detection task by dealing with relation camouflage and feature camouflage.This thesis conducts systematic research based on CARE-GNN from a variety of aspects and proposes new models and methods which boost performance of proposed models.Research contents of this thesis are as follows:(1)Residual Layered CARE-GNN(RLC-GNN)is proposed based on a new deep architecture for spatial-based GNN.Stacking multiple layers in the conventional way defined by hop leads to rapid performance drop.The results of CARE-GNN’s experiments show that the single-layer model performs the best.However,the single-layer structure makes the model unable to extract more useful information to fix potential mistakes.The performance is heavily dependent on the only one layer.To solve this problem,RLC-GNN is proposed to make the model learn incrementally and correct mistakes continuously by combining layered structure and residual layered to form a complementary relation with deep architecture.Numerous experiments are conducted on Yelp and Amazon datasets.And recall,AUC and macro-F1are chosen to evaluate proposed model.The results show that proposed RLC-GNN obtains up to 5.66%,7.72%and 9.09%improvements in recall,AUC and macro-F1respectively.And it achieves up to 3.66%,4.27%and 3.25%improvements in the same metrics on Amazon dataset.(2)Three problems are revealed after the review of RLC-GNN,and three methods are proposed to deal with the problems respectively.The three problems are lack of comprehensive consideration about the patterns of users’behaviors,the training difficulty which is the inherent problem to deep models and the usage of neighboring information reaching limitation.Three methods are proposed to solve these problems:performing similarity measure via cosine distance to comprehensively consider the patterns of users’behaviors;partial neighborhood normalization that performs normalization on center nodes along batch-wise and node-wise only using statistics of neighboring nodes with enough similarities to accelerate training process;and intermediate information supplement that expands receptive field to 2-hop starting from the middle layer to introduce new information for further learning.Experiments are conducted on the same datasets and settings as that of(1).The effectiveness of each method is verified.Three methods are integrated with RLC-GNN-6.On Yelp dataset,significant performance gains are obtained,which recall,AUC and macro-F1 improve by 4.81%,6.62%and 6.81%respectively.On Amazon dataset,recall and AUC increase by 1.65%and 0.29%respectively.And the decrease of macro-F1 is analyzed.(3)The difference of the importance of neighboring nodes to corresponding center node is neglected,and the semantics in multi-relation graph is not utilized effectively.Inspired by Graph Attention Network(GAT),Two-Stage Graph Attention Network(TS-GAT),which improves the framework of aggregating processes of CARE-GNN by employing attention mechanism and takes the advantage of deep architecture of RLC-GNN,is proposed.In intra-relation aggregation,the attention layer computes attention coefficients for neighboring nodes,making the networks pay more attention to informative neighbors.In inter-relation aggregation,the semantics of aggregated embedding under each relation are used to compute attention coefficients of corresponding relation for each node in a mini-batch.Experiments are conducted on the same datasets and settings as(2).The results show that TS-GAT achieves improvements of 1.31%、0.27%and 2.07%on recall,AUC and macro-F1 on Yelp dataset,and obtains 1.07%、0.39%and 0.79%performance gains on the same three metrics on Amazon dataset.TS-GAT apparently outperforms several state-of-the-art methods on Yelp dataset and is competitive on Amazon dataset.The tiny inferior on Amazon dataset is analyzed,and the potential methods for further improvements are provided. |