Font Size: a A A

Personal Credit Score Modeling And Analysis Based On Data Mining

Posted on:2017-05-06Degree:MasterType:Thesis
Country:ChinaCandidate:K LiuFull Text:PDF
GTID:2308330488982418Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
With the continuous economic development, people’s housing, cars, education, daily consumption of households have credit demand is also increasing. So for financial institutions on how to avoid potential personal credit risk is a major challenge faced by banks and credit institutions. With data mining technology, the establishment model to predict the probability of personal credit loans loan defaults, a credit score of individuals can more accurately predict the breach.Personal credit loans prediction essentially we need to find a classification model, will be able to schedule individual consumers into debt service (ie "good" customers) and default (or "bad" customers) categories. For such problems, we choose Logistic regression and decision tree classification modeling and compare the advantages and disadvantages between the two, selecting the optimal model.In this paper, empirical data kaggle competition binding data SAS, SPSS software for research papers, the first combination of SAS software, raw data randomly divided into a training set, validation and test sets, followed by pre-processing the data set, missing values, outliers and multicollinearity inspection test, and corresponding use of interpolation and variable cluster analysis dataset variable screened processed, and finally selected from five variables x1 x1-x10 ten variables, x2, x4, x8, x9 Logistic regression modeling; Then get three candidate models by Logistic regression analysis of the full model law, the three candidates for the model parameter estimation and model significance test fitting the data obtained two prediction models, and two models AUC calculated statistics are 0.714, indicating that the effect is more ideal model predictions, in order to further select high robustness, simple optimal model, and through the validation set drawing ROC curve and calculated AUC value, the two models in the validation dataset AUC values in excess of 70%, final consolidated comparison of the optimal model, screened x2, x8, x9 establish Logistic regression model; Then combine the training set using SPSS software algorithms Exhaustive CHAID decision tree classification model, screened x1, x3, x4, x7, x9 five variables, and then test the robustness of the model validation set to give AUC value of 0.839, indicating that the model there is robust; and finally passed the test set and compare Logistic regression model to predict the effect of the decision tree classification model, Logistic regression models and decision tree classification model to predict the probability of default and the actual value of p squared error and 823.298, and 231.559, respectively, indicating that the model the predictive accuracy, robustness, the decision tree model is better than Logistic regression model.
Keywords/Search Tags:Data mining, personal credit ratings, Logistic regression, decision tree classification
PDF Full Text Request
Related items