Research On Credit Card Fraud Detection Based On Ensemble Learning Model

Posted on:2024-04-02

Degree:Master

Type:Thesis

Country:China

Candidate:Q L Deng

Full Text:PDF

GTID:2568307106486204

Subject:Applied statistics

Abstract/Summary:

PDF Full Text Request

Nowadays,because of the popularization of bank credit card business,credit card consumption has become the most popular consumption pattern for young people.In this process,there are some illegal customers disturb the market order,they appear in the credit card fraud,which has brought huge economic losses to individuals,banks,countries.In order to solve this problem,this thesis constructs a credit card fraud detection system by using the credit card holder’s history consumption information and default record as experimental data,to found abnormal cardholders,timely remind them to reduce bank’s losses.However,there are some problems in credit card fraud detection,such as large amount of data,high dimension of data and extreme imbalance of data,and it is difficult to classify the trading data accurately.To solve these problems,we use Auto Encoding(AE)for feature extraction,Ensemble Learning and Cost-Sensitive Learning are used to solve the problem of data imbalance.The data comes from the Kaggle’s website,which collects transaction data from more than 800 merchants and 5,000 customers using credit cards from January 1,2019 to June 31,2020,covers both legal and fraudulent transactions,with nearly 1.3 million instances,the ratio of legitimate transactions to fraudulent transactions is 99.5:0.5,indicating a significant imbalance.In the data processing stage,the first step is data preprocessing,which includes missing value and repeated value checking,outliers processing,data format conversion and variable derivation.Then a exploratory data analysis is conducted,with descriptive and visual analyses of individual and bivariate variables in turn,to explore the structural characteristics of each variables and the relationship between the independent variables and the target variable.For feature extraction,AE is used to compress 9 numerical variables into 6,in order to eliminate the correlation and reduce the dimension.In this thesis,three Ensemble Learning models including Random Forest,Light GBM and Adacost are selected as prediction models,and recall rate,F2＿Score,AUC are used as the evaluation criteria,in the end,the importance scores of each feature are analyzed on the optimal model.Show in final results: the predictive performance of AE+Light GBM model is the best,with all the indicators higher than those of the AE+RF model and the AE+Adacost model.Compared with Light GBM model,the accuracy,precision,F2＿Score and AUC of AE+Light GBM model(98.70%,30.45%,0.6689,0.9962respectively)were improved slightly(0.05%,0.24%,0.0013,0.0003 respectively),the recall rate(95.45%)is reduced by 0.38%.In addition,the AE+Light GBM model has the best generalization ability,the difference of precision,recall,F2＿Score and AUC between the train and test set(0.84%,3.67%,0.0225,0.0012,respectively)are lower than that on AE+RF Model(1.54%,9.75%,0.0493,0.0031,respectively)and AE+Adacost model(0.98%,4.24%,0.0265,0.0068,respectively).Based on the Light GBM model,the importance scores of all variables was obtained,among which numerical features such as the total payment,the population of the residential area,and the age have a significant impact on whether there is fraudulent behavior,as well as categorical features such as payment type and payment time period,which play an import role in the prediction process.In conclusion,AE+Light GBM model is the best in both predictive performance and generalization performance,and the predictive performance of the model is not reduced by using the AE to reduce the dimensionality of the numerical variables.

Keywords/Search Tags:

Credit Card Fraud Detection, Auto Encoding, Random Forest, LightGBM, Adacost

PDF Full Text Request

Related items

1	Research On Credit Card Fraud Detection Based On Machine Learning Algorithm
2	Research On Credit Card Fraud Detection Based On Random Forest
3	Improvement And Application Of Random Forest Algorithm In Credit Card Fraud Detection
4	Research On Credit Card Fraud Detection With Deep Forest
5	Application Of Machine Learning In Credit Card Fraud Detection
6	Study On The Credit Evaluation And Intelligent Decision-making System Of Credit Card Fraud Detection
7	Research On Credit Card Fraud Detection Model Via On Blockchain And Federated Learning
8	The Research On Classification Algorithms In Credit Card Fraud Detection
9	A Credit Card Fraud Detection Model Based On CNN-SVM
10	Credit Card Fraud Detection Using CNN