In the rapid development of "Internet + Finance",the detection of credit card transaction fraud has been the focus of attention of card issuers,and also an important research topic for researchers at home and abroad.Among them,the problem of sample imbalance is a key factor that affects the accuracy of fraud detection,and the accuracy of existing fraud detection models can be further optimized and improved.In order to solve the above two problems,this paper has carried on the algorithm convergence,and used the Generative Adversarial Nets(GAN)and EXtreme Gradient Boosting(XGBoost)to construct a model of credit card fraud detection based on GAN-XGBoost.The feasibility and validity of the model are verified by comparing the real historical transaction data set.The main work of this paper is as follows:(1)In view of the imbalance of transaction data samples,GAN algorithm is used to enhance the data and generate a few new samples.The generation process does not require tedious sampling sequences,and instead of directly copying or averaging the real data,it only needs to sample and extrapolate the new samples directly.The data generated by GAN is approximate to the real value,which effectively avoids the influence of inauthenticity of generated data on classification and detection accuracy.(2)In the process of model training,in order to avoid over-fitting,high computational complexity,weak adaptability,relatively low classification accuracy,etc.,the XGBoost algorithm in the integrated learning algorithm is introduced.The algorithm adds regular terms to the objective function to find the optimal solution,balance the complexity of the model,effectively avoid the problem of overfitting,and perform multi-threaded parallel computing,which can improve the accuracy of fraud detection.(3)In order to highlight the high efficiency of the detection model in this paper,two groups of comparative experiments were performed.One group is based on the comparison of different data sample balancing methods based on XGBoost,and the other is based on the different classification algorithm models based on GAN.At thesame time in the process of model evaluation,in order to visually understand the performance of the classifier,this paper adds a new model evaluation index,Area under the Precision-Recall Curve(AUPRC)for model evaluation. |