Font Size: a A A

The Application Of Data Mining Methods In Credit Card Default Prediction

Posted on:2021-05-11Degree:MasterType:Thesis
Country:ChinaCandidate:M Q WangFull Text:PDF
GTID:2428330605457298Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
Since China has entered the historical development period of the "new era",China has witnessed great achievements in economic and social development,and the living standards of the people have been continuously improved.Meanwhile,with the acceleration of information technology,people's consumption concepts have gradually changed.Credit card consumption has gradually occupied a pivotal position in people's daily consumption.However,financial institutions have also brought huge risks to themselves while expanding personal credit business.Therefore,in the lending of credit card business,the information asymmetry between financial institutions and clients has led financial institutions to face potential default risks.How to effectively prevent and control credit card default risks while expanding credit card business is a problem that financial institutions need to solve.In this regard,cardholders' historical consumption and repayments records are of great value.Banks and other financial institutions use the clients'historical credit history as a tool for clients credit scoring to predict whether it is a defaulting client or a non-compliant one according to card holders' monthly repayments.Then based on the overdue and default conditions,determine whether to implement further marketing or consumption restrictions on cardholders.How to accurately identify potential default clients,reduce losses of financial institutions due to customer credit default risks and tap high-quality credit clients to improve operating efficiency are the core issues that financial institutions have long been eager to solve.Therefore,it is of great practical significance to establish a scientific and effective credit card default prediction model.Currently,data mining technology has matured and has been widely and well applied in many areas of society.Especially in the analysis of big data,data mining methods have incomparable advantages.The bank's database collects and records the users' personal information,credit card historical transaction data,etc,to a certain extent,reducing financial risks due to information asymmetry.Based on these massive historical data,the risk department uses data mining technology to establish accurate default prediction models,which can improve financial risk warning and monitoring capabilities.This paper chooses to introduce and use four types of data mining methods:Lasso-Logistic model,decision tree classification,support vector machine,and K-nearest neighbor method to conduct empirical analysis on the historical data of bank credit cards.The prediction accuracy is used as a measure of the goodness of model fitting.First,perform data preprocessing on the data imbalance and missing data in the original data.The sample data is divided into two parts,the training set and the test set.The confusion matrix analysis is used to compare the prediction effects of the four methods on the test set.Based on this,the four models are compared to obtain a credit card default risk assessment model with higher prediction accuracy,so as to implement technical means that can accurately predict credit card defaults and control credit risk in a real sense,and provide reliable theoretical and technical support for decision makers in banks and other financial institutions to make correct decisions.The empirical results show that the Lasso-Logistic regression performs the best,and the model has the highest accuracy rate in predicting compliance.At the same time,the Lasso-Logistic model also gives explanatory variables for predicting defaults;the support vector machine model has the worst performance in compliance prediction.Secondly,the four methods are not as effective in predicting defaults as they are in terms of compliance.The best performing default prediction rate is the decision tree model,followed by the Lasso-Logistic model,and the K nearest neighbor method has the worst performance in identifying default clients.It can be seen from the comparison that the Lasso-Logistic model and the decision tree model perform relatively better on the default and compliance prediction than the other two methods.In contrast,the support vector machine and the K-nearest neighbor method have an overall accuracy less than 80%.In summary,in actual business practice,priority should be given to referencing the Lasso-Logistic model and decision tree model to credit card default prediction.Although the prediction accuracy of several model methods is high,there is still a lot of room for optimization.Therefore,we can consider introducing other forms of variables or credit evaluation methods to further optimize and improve the model to make the prediction result accuracy on the final test set higher.
Keywords/Search Tags:credit card default, data mining, Lasso, Logistic regression, decision tree, support vector machine, K-nearest neighbor method, confusion matrix
PDF Full Text Request
Related items