| With the vigorous development of mobile internet,internet services are becoming more and more user-friendly.These Internet services,supported by big data and cloud computing,are improving people’s lives in an all-round way.However,due to the explosive growth of data volume,some criminals will hide their fraud in massive data,evade the detection of the anti-fraud system by pretending to be normal Internet users,and then implement a series of criminal acts to seek profits.Fraud detection in the Internet is mainly faced with the following three challenges:first,the categories are extremely unbalanced.Due to the large number of users in the Internet,fraudsters only account for 1/10000 or less of the normal users,so it is difficult for fraud detection models to identify fraudsters efficiently;The second is the fraudsters’ counterattack against the anti-fraud system.The fraudsters escape the detection of the anti-fraud model by changing their past behavior or imitating the behavior of normal users.The rapid change of fraud means also poses new challenges to the fraud detection system;Third,because of the large amount of noise information and the scarcity of labels,the guidance signal of feature learning is weak.Because most data lack complete labels,noise information will also interfere with the training of models.How to detect fraud in new scenarios has become another challenge.Based on the above description,this thesis cuts into the field of fraud detection from the perspective of data level and graph structure level,and the main work is as follows:(1)From the data level,due to the extreme imbalance of the data set in the fraud detection task,the traditional data sampling algorithm cannot be well applied to the graph-structured data.Therefore,this thesis proposes an oversampling algorithm based on the heterogeneous graph(HGSMOTE),which applies the traditional oversampling algorithm to the heterogeneous graph.Considering that there is rich feature information in the heterogeneous graph,and the previous oversampling algorithm will lose the corresponding attribute information when generating new samples,this thesis uses the heterogeneous graph neural network based on attribute completion to aggregate node information in the feature extraction stage.At the same time,the heterogeneous graph attention network is used to perform node classification tasks.The experimental results on Amazon and Yelp Chi data sets prove the effectiveness of HGSMOTE algorithm.(2)From the perspective of graph structure,considering that fraudsters usually camouflage their own behavior characteristics to resist the detection of anti-fraud systems,and map to the graph structure level,that is,fraudsters tend to establish edge connections with normal users,this thesis proposes a heterogeneous graph fraud detection framework(HAFD)based on multi-layer attention mechanism,specifically,by designing a relationship fusion module and a neighborhood fusion module to obtain graph structure features,by eliminating noise information and fusing feature information through an information awareness module to generate final node embedding information,and by correcting imbalances in the data through an imbalance oriented classification module,thereby achieving successful classification of fraudulent nodes.The experimental results on Amazon and Yelp Chi data sets show that HAFD can effectively filter out noise information in model training and accurately identify fraudsters who disguise their own characteristics. |