Font Size: a A A

Research On Credit Scorecard Model Based On Data Mining Technology

Posted on:2020-08-12Degree:MasterType:Thesis
Country:ChinaCandidate:Q LiuFull Text:PDF
GTID:2428330590982853Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of personal consumer credit in the Internet finance,how to analyze customer behavior based on customer's characteristic behavior data,further to optimize the classification of customer,provide scientific data support for managers,and conduct more effective risk management,is becoming more and more important.In order to estimate risks effectively and intuitively,a credit scorecard model was introduced.This paper mainly analyzed through the following five aspects: The first chapter mainly introduced the research background and significance of the topic,and the research situation at home and abroad.The second chapter introduced the data mining technology briefly,and the development history and classification of the scorecard.The third part is mainly to preprocessed the data,including: missing value processing,outlier processing,data exploration,feature processing,class imbalance processing.The fourth part builded the model,this paper selected three methods of logistic regression,decision tree and support vector machine to model,and used the accuracy,recall rate,F1 value and AUC value to evaluate the model,the logistic regression method was utilized to establish the credit scorecard model through comprehensive consideration of predictive ability and explanatory ability.Finally,we gave summaries and future prospects about this paper in the fifth chapter.The results show that among all the original features,the characteristics of historical overdue behavior are used for the final modeling,indicating that the historical overdue behavior has a significant impact on whether the user defaults.In the selection of the model,the overall assessment,predictive ability: the AUC value of the decision tree is the highest,reaching 0.8373,the logistic regression also reached 0.8309,the difference is not big;the ability to explain: the logical regression performance is the best,the decision tree is the second.Based on the purpose of this paper,logistic regression is used to establish a credit scorecard model,and 100,000 user data are scored.From the score distribution map,it can be seen that there are fewer users with low scores,which is in line with the distribution ratio of good and bad users.
Keywords/Search Tags:data mining, credit scorecard, logistic regression, decision tree, support vector machine
PDF Full Text Request
Related items