Font Size: a A A

Personal Credit Evaluation Method Based On Integration Of XGBoost And LR

Posted on:2022-08-11Degree:MasterType:Thesis
Country:ChinaCandidate:H L HeFull Text:PDF
GTID:2518306554482674Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The innovation of Internet lending platform and the change of personal consumption concept promote the development of domestic credit business.More and more credit products come into people's lives.However,with the expansion of credit business scale,if the borrowers can not repay the principal and interest on time,it will bring huge risks to financial institutions and cause business losses.Therefore,building an effective personal credit evaluation system can deal with the potential personal credit risk,which has important practical significance for both the financial institutions themselves and the credit society.At present,machine learning methods are commonly used to solve the problem of personal credit risk evaluation.This paper uses XGBoost and logistic regression model to build an integrated model.Firstly,smote method is used to solve the problem of positive and negative sample imbalance in the data set.Then,the data set is divided into training set and test set to train XGBoost model.The XGBoost model is generated into leaf nodes of lifting tree to get new feature vectors.Then,the new feature variables are fused with the original feature variables,Input to the logistic regression model to train sub classifiers,and finally output the results of ensemble classification.Based on the in-depth analysis of the current personal credit risk assessment technology,this paper uses the Internet lending Club lending platform to conduct data preprocessing,feature engineering and importance analysis on the data set of about 1million real customer transactions from 2018 to 2019,with 149 characteristic variables in each customer's information,Fifteen characteristic variables were selected for model training,and compared with the ensemble learning algorithms such as random forest,GBDT and XGBoost.By comparing the accuracy and AUC value of the above models,the experimental results show that the AUC value of the integrated learning model is0.932,which is 0.049 higher than random forest,0.025 higher than GBDT and 0.022 higher than XGBoost;In terms of accuracy,the accuracy of the model is 0.925,which is 0.052 higher than random forest and 0.022 higher than XGBoost.To sum up,the model proposed in this paper can effectively improve the classification ability and accuracy of prediction and evaluation,and provide a new solution for the research of personal credit risk assessment.
Keywords/Search Tags:Personal credit risk, Integration algorithm, Unbalanced data, Feature selection
PDF Full Text Request
Related items