Font Size: a A A

An Empirical Study On Internet Finance Anti-fraud Based On CNN-XGBoost

Posted on:2021-03-07Degree:MasterType:Thesis
Country:ChinaCandidate:Y S LiuFull Text:PDF
GTID:2428330605457322Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
The purpose of financial fraud detection is to predict potential fraudulent users and fraudulent behaviors and reduce financial institution losses.With the rapid development of Internet finance,the demand for financial anti-fraud solutions has become more urgent.Machine learning methods are widely used in fraud detection.Both the model and the neural network are important classification methods.The tree model is more interpretable,but the classification effect is very dependent on manually designed features.The neural network can automatically achieve feature extraction,but it is easier to overfit.Therefore,this paper combines the advantages of these two algorithms to study the combination of CNN and XGBoost.XGBoost is a commonly used classification method.XGBoost introduces regularization to control the complexity of the model,which greatly improves the model's ability to resist overfitting.Compared with the traditional tree model,the effect improvement is very obvious,but XGBoost still relies on manual feature engineering.CNN can automatically complete feature extraction,combine and filter important features in the data in high-dimensional space,but as the abstraction of features increases,overfitting phenomenon greatly affects the performance of the network.In this paper,the low abstraction features in CNN are added to the original features to train XGBoost.At the same time,the features of CNN automatic feature extraction and XGBoost anti-overfitting are used to improve the upper bound of model fitting.The main work of this article:(1)Perform preprocessing and prior analysis on the data.Based on credit card fraud data with a ratio of positive and negative samples close to 1:12,an exploratory analysis of the original data was performed to visually understand the distribution and lack of data and pre-process some of the data,using KNN to fill in the selection by XGBoost based on the average gain of node split Missing values for important features and labeling categorical variables.(2)Design and implement feature extraction based on CNN.The CNN is composed of a convolutional layer and a fully connected layer.The convolutional layer realizes feature extraction,and the fully connected layer further realizes the mapping of the feature space to the target space.Because the fully connected layer is very easy to overfit,after the network training is completed,only the output of the convolution layer is extracted as a new derivative variable and the original feature is added to train XGBoost,and XGBoost further learns to obtain the classification result.(3)Finally,the classification results of the XGBoost algorithm,CNN algorithm and CNN-XGBoost hybrid model are compared,and ROC-AUC,F1-score and balance point BEP are selected as the performance indicators for the two classifications.Experimental results show that the classification effect of the CNN-XGBoost hybrid model is better,indicating that the method combines the advantages of CNN and XGBoost,and improves the classification effect of the model while solving the problem of feature engineering relying on artificial experience.
Keywords/Search Tags:feature engineering, fraud detection, ROC-AUC, CNN, XGBoost
PDF Full Text Request
Related items