An Empirical Study On Internet Finance Anti-fraud Based On CNN-XGBoost

Posted on:2021-03-07

Degree:Master

Type:Thesis

Country:China

Candidate:Y S Liu

Full Text:PDF

GTID:2428330605457322

Subject:Applied Statistics

Abstract/Summary:

PDF Full Text Request

The purpose of financial fraud detection is to predict potential fraudulent users and fraudulent behaviors and reduce financial institution losses.With the rapid development of Internet finance,the demand for financial anti-fraud solutions has become more urgent.Machine learning methods are widely used in fraud detection.Both the model and the neural network are important classification methods.The tree model is more interpretable,but the classification effect is very dependent on manually designed features.The neural network can automatically achieve feature extraction,but it is easier to overfit.Therefore,this paper combines the advantages of these two algorithms to study the combination of CNN and XGBoost.XGBoost is a commonly used classification method.XGBoost introduces regularization to control the complexity of the model,which greatly improves the model's ability to resist overfitting.Compared with the traditional tree model,the effect improvement is very obvious,but XGBoost still relies on manual feature engineering.CNN can automatically complete feature extraction,combine and filter important features in the data in high-dimensional space,but as the abstraction of features increases,overfitting phenomenon greatly affects the performance of the network.In this paper,the low abstraction features in CNN are added to the original features to train XGBoost.At the same time,the features of CNN automatic feature extraction and XGBoost anti-overfitting are used to improve the upper bound of model fitting.The main work of this article:(1)Perform preprocessing and prior analysis on the data.Based on credit card fraud data with a ratio of positive and negative samples close to 1:12,an exploratory analysis of the original data was performed to visually understand the distribution and lack of data and pre-process some of the data,using KNN to fill in the selection by XGBoost based on the average gain of node split Missing values for important features and labeling categorical variables.(2)Design and implement feature extraction based on CNN.The CNN is composed of a convolutional layer and a fully connected layer.The convolutional layer realizes feature extraction,and the fully connected layer further realizes the mapping of the feature space to the target space.Because the fully connected layer is very easy to overfit,after the network training is completed,only the output of the convolution layer is extracted as a new derivative variable and the original feature is added to train XGBoost,and XGBoost further learns to obtain the classification result.(3)Finally,the classification results of the XGBoost algorithm,CNN algorithm and CNN-XGBoost hybrid model are compared,and ROC-AUC,F1-score and balance point BEP are selected as the performance indicators for the two classifications.Experimental results show that the classification effect of the CNN-XGBoost hybrid model is better,indicating that the method combines the advantages of CNN and XGBoost,and improves the classification effect of the model while solving the problem of feature engineering relying on artificial experience.

Keywords/Search Tags:

feature engineering, fraud detection, ROC-AUC, CNN, XGBoost

PDF Full Text Request

Related items

1	Research On The Detection Model Of Credit Card Transaction Fraud Based On GAN-XGBoost
2	Application Research Of Network Advertising Fraud Detection System Based On Xgboost
3	Research On The Application Of Artificial Intelligence In Insurance Anti-fraud
4	Research On Transaction Fraud Detection Based On Rule Attention Machine
5	Design And Implementation Of Anti-fraud Risk Prediction System
6	KPI Anomaly Detection Based On Multi-dimensional Feature Extraction And XGBoost
7	Advertising Anti-fraud Research Based On Catboost Model
8	Research On Credit Card Fraud Detection Based On Ensemble Learning
9	Fraud Detection On Telecommunication Business Base On Feature Tree Analysis
10	Research On Accurate Hybrid Recommendation Method Based On Feature Engineering And Efficient Gradient Boosting Decision Tree Algorithm