Application Research Of Data Mining Technology In Personal Credit Score Prediction

Posted on:2023-04-25

Degree:Master

Type:Thesis

Country:China

Candidate:J Yang

Full Text:PDF

GTID:2558306848968329

Subject:Applied Statistics

Abstract/Summary:

The prosperity of consumer credit industry not only promotes the development of economy,but also brings the problem of credit risk.Major banks and financial institutions all hope to extract valuable information from the massive information of customers,and then analyze the credit rating of customers,so as to effectively avoid credit risk.Therefore,how to apply data mining technology to personal credit scoring model and improve model prediction performance has become an important research direction.In this paper,the credit data set is obtained from Data Castle big data competition platform.First,through exploratory data analysis,descriptive statistics are carried out on each characteristic variable,the coverage rate and missing rate are calculated,and the quality of the data is preliminarily tested.Second,combed the related problems of missing values,and selects three methods: K-nearest neighbor imputation,multivariate feature imputation,and random forest imputation to process the missing data of the original data set,segments the interpolated data set,fits the training set through the decision tree classification algorithm,uses the fitted decision tree algorithm to predict the results of the test set,and compares the classification accuracy of the three methods.The experiment shows that the multivariate feature interpolation method is slightly better than the other two.Then,an improved Boruta feature selection algorithm is proposed,and the feasibility of the improved method is verified by using the data set and decision tree algorithm in the UCI machine learning database.The improved method is applied to the credit data set,combined with the WOE binning and IV value results,select the best and most suitable feature subset to participate in modeling.Finally,the credit data set is divided into 70% training set and 30%test set.The personal credit score prediction model is established by using the traditional credit scoring method logistic regression and data mining technology XGBoost algorithm.The prediction performance of the model is evaluated through the evaluation indexes such as accuracy rate,recall rate,ROC curve and AUC value.The AUC value of Logistic regression is 0.45,and the AUC value of XGBoost algorithm is 0.89.The experimental results show that the model based on the improved Boruta feature selection algorithm XGBoost has better prediction performance.

Keywords/Search Tags:

Credit score, missing value imputation, improved boruta, logistic regression, xgboost algorithm

Related items

1	Comparative Study On Imputation Methods Of Missing Data In XGBOOST Model Under Complete Random Missing Mechanism
2	Research On Credit Scoring Model Based On Machine Learning
3	Design And Analysis Of Personal Credit Scorecard Based On Logistic Regression
4	The Online Imputation Method Of Missing Value Based On KNN And Its Application In Credit Evaluation
5	Nonparametric Imputation For Missing Data
6	Studies On Missing Data Imputation
7	Credit Risk Assessment Based On CatBoost Fusion Algorithm Evaluation And Model Research
8	Prediction Of Personal Credit Default Risk Based On Machine Learning
9	The Analysis And Improvement Research Of Knn-imputation Algorithm
10	Research On Imputation Method For Clustering Data With Missing Values