Font Size: a A A

Research On Individual Credit Evaluation Model Based On Multi-classification Algorithm

Posted on:2021-01-01Degree:MasterType:Thesis
Country:ChinaCandidate:Z H ZhouFull Text:PDF
GTID:2428330605474523Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
Individual credit evaluation is important in the financial credit field.Our domestic online financial enterprises have developed a lot but are not good enough at risk control.Lending Club once was the largest online lending platform worldwide thanks to its accurate risk control model.It is of great value for domestic online financial enterprises to study Lending Club's credit evaluation rules.This paper selects the data of Lending Club from 2007 to 2015.There are few researches on multi-classification model of individual credit evaluation.Previous research always takes "whether default or not" as the response variable to construct the classification model."Loan credit rating" is the rating given by Lending Club for customers' loan applications,which accurately divides personal credit into seven grades of A-G.However,how the rating is obtained is unknown,so this paper uses it as the response variable to build a multi-classification credit score model,and aims at seeking out a model that can accurately estimate the "credit rating of loans",which is important for domestic financial enterprises.This paper uses python to construct multi-classification credit score model based on machine learning method.In this paper,logistic regression,KNN,SVM,decision tree,random forest and LightGBM are selected.Ensemble learning models are further constructed according to the principles of accuracy and diversity.This paper takes accuracy and adjusted f1 score as the measurement and then find that the best model are LightGBM and Stacking model with LightGBM,random forest and k-nearest neighbor as the basic classifier and multi-response logistic regression as the meta-classifier.Inspired by Simpson's paradox,this paper finds that the balance of data will affect the evaluation result of each model.In other words,the results of model testing on the balanced samples may be different from those that applied to the unbalanced population.This paper points out that when the proportion of each grade in the population varies greatly,the following phenomena may occur:model B is superior to model A in the sample,but model B is inferior to model A when applied.This paper gives the correction formula of the accuracy when applied to the population.In this case,the order of each model according to the accuracy is the same as the order according to the corrected accuracy.Therefore,the selected optimal model is not only effective but also robust,and is not affected by the data balance,so the model is of application value.
Keywords/Search Tags:individual credit evaluation model, multi-classification, machine learning, Simpson's paradox, accuracy
PDF Full Text Request
Related items