In recent years,the demand for credit cards has continued to rise,and major banks and financial institutions have continued to increase the number of cards issued to meet demand,while also increasing the incidence of credit card fraud risk.With the progress of financial technology,machine learning,deep learning and artificial intelligence in the field of finance continue to practice,so that fraud risk identification is no longer entirely dependent on the subjective judgment of experts,so the establishment of fraud detection models to assist decision-making has been the trend,many banking institutions also based on the model to optimize the risk strategy,resulting in a decreasing trend of fraud rate.Therefore,the establishment of good fraud detection models can improve the recognition rate,reduce the loss caused by fraud,and promote the healthy development of credit card industry.In this thesis,the KS value,AP value and AUC value,which are derived from the confusion matrix,are used to evaluate the performance of the algorithm model,and the comparison of the core metrics is used to distinguish the strengths and weaknesses of the model.The credit card fraud detection problem is an unbalanced binary classification problem,where the number of fraudulent samples is much less than the number of non-fraudulent samples.If modeling on the underlying dataset,a model that predicts all samples as non-fraudulent will be trained,thus deviating from the original purpose of the model to identify fraudulent or non-fraudulent,and obtaining a wrongly guided conclusion.Therefore,the data set needs to be balanced before modeling,and the commonly used methods are oversampling,undersampling and combined sampling.In this thesis,we propose a comprehensive sampling algorithm that controls the sampling process based on the importance of the tree model on the features generated by the data,and the data set balanced by various sampling algorithms is trained and predicted under the Light GBM algorithm with the same parameter structure.The dataset balanced by this method has better results in algorithm performance,with great improvement in the core indexes KS value,AP value,and AUC value.The fraud detection model established in this thesis is based on the Stacking fusion model with XGBoost,Light GBM,and Catboost as the initial learner and Logistic as the secondary learner,which proves that the model has higher model complexity and outstanding performance compared with individual classifiers,and has a very significant improvement in the core indexes.The model has a very significant improvement in core metrics,and the fraud detection effect is remarkable,but there is obvious overfitting.In order to solve the overfitting problem,we propose a deep neural network with tree model weights to optimize the prediction results of Stacking fusion model,which is combined into a hybrid model,and the deep neural network has a better performance than the uninitialized neural network,and the KS,AP and AUC values are significantly improved.The experiments demonstrate that the final hybrid model not only improves the AP and AUC values again compared with the Stacking fusion model,but also reduces the KS difference between the training and test sets,which effectively alleviates the overfitting.algorithm combinations to build models,providing some guidance for the development of credit card fraud detection models. |