Font Size: a A A

A Comparison Of Machine Learning Methods For Credit Card Fraud Detection

Posted on:2021-02-05Degree:MasterType:Thesis
Country:ChinaCandidate:S H XieFull Text:PDF
GTID:2428330605463452Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
Credit card originated in the United States in the early 20th century,and began to be popular in the 1960s.In 1985,China began to use credit cards.Credit card is very popular all over the world.There are countless credit card users.It is widely loved by users for its safety,rapidity and convenience.According to the credit line for early consumption,regular repayment is also in line with the consumption habits of modern people.With the rapid development of global economy and the rise of the Internet,the use of credit card in the Internet is becoming more and more popular.But the following credit card fraud has also brought obstacles to the development of credit card business.Tens of billions of dollars are lost every year due to credit card fraud around the world.It is very important for the development of credit card business to establish a fraud detection system.This paper compares several popular models of credit card fraud detection in machine learning.Logical regression is widely used in credit card fraud detection because of its good effect in binary classification,fast training speed,ability to calculate characteristic coefficients and strong interpretability.The reason why decision tree is used in credit card fraud detection is that it is easy to visualize and understand in classification problems,and decision tree is less affected by data itself,that is,it does not need to make feature selection for data,and its modeling direction is also to choose the optimal feature.However,there are risks of under fitting in logical regression and over fitting in decision tree.At the same time,we use some integrated algorithms to detect credit card fraud.Random forest,AdaBoost tree and gbdt are all base classifiers based on decision tree,so they also inherit the advantages and disadvantages of decision tree,but they have higher precision than decision tree.As the data of credit card fraud is a highly unbalanced data,only more than 400 cases of 280000 credit card transactions are credit card fraud transactions.Because of the high imbalance of samples,it is easy for the classifier to train in the direction of most classes.Through the combination of classifier and sampling method,the threshold value of classification is determined by cost sensitive matrix.By comparison,BSMOTE AdaBoost classifier is the best for credit card fraud detection,and the accuracy can reach 92%when the recall rate is 87%.However,due to the "oversampling" of the training samples by borderline smote,the training samples are multiplied.In addition,the operation speed of boosting algorithm itself is relatively slow,resulting in the operation time of BSMOTE AdaBoost is the slowest of all algorithms.Compared with the random forest algorithm without resampling,its operation time is more than 5 times of the random forest algorithm.Moreover,the recall rate of random forest algorithm is more than 82%,and the accuracy is even more than 97%.If the calculation time is taken into account,the random forest algorithm is also suitable for credit card fraud detection.
Keywords/Search Tags:Credit card, Machine learning, Fraud detection, Imbalanced class
PDF Full Text Request
Related items