Font Size: a A A

Research On Credit Risk Control On Machine Learning

Posted on:2021-04-05Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y LiuFull Text:PDF
GTID:2428330614965709Subject:computer technology
Abstract/Summary:PDF Full Text Request
With the popularization of the concept of “Internet+”,China's Internet finance industry is developing rapidly and the market share of personal credit business is also growing rapidly,which makes the business data become complex and diverse.Traditional credit risk control is mostly modeldriven strategy,which can no longer meet the demand of default risk prediction,resulting in frequent occurrence of various default events and large losses to institutions.Therefore,it is necessary to introduce machine learning algorithm to improve the credit risk control mechanism and promote the healthy and sustainable development of the credit business market.This paper uses machine learning algorithm to solve two problems in credit risk control scenarios.First,in the initial stage of new credit products,there is no business accumulation and only a small amount of marked data and a large amount of unmarked data,so it is impossible to establish a datadriven supervised credit risk control model.Second,after the launch of credit products for a period of time,a certain amount of data has been accumulated,and most institutions will use Logistics Regression(LR)to realize credit risk control modeling.LR model is simple,easy to implement and fast to train.However,this model is a linear model with limited learning ability,so it cannot learn the non-linear relationship between features.It needs experienced risk control engineers in credit business to do the artificial feature combination,so a large amount of labor costs is needed.About the above problems,the main work of this paper is as follows:(1)In view of the problem of not being able to build a data-driven supervised credit risk control model at the initial stage of credit product launch,this paper proposes a cold start method based on dirichlet process mixture model(DPMM)and isolation forest(IForest).In this method,DPMM was used to calculate the default similarity of unmarked samples,and IForest was used to calculate the default anomaly of unmarked samples.Based on the combination of default similarity and default anomaly,reliable normal samples and potential default samples were screened out,so as to provide sufficient samples for the follow-up supervision model training.(2)In order to solve the problem that the single LR model is not able to learn the nonlinear relationship between data features at the late stage of credit product launch,this paper proposes the integration method of XGBoost-LR model based on Bagging.In this method,e Xtreme Gradient Boosting(XGBoost)is adopted for feature transformation,and the output of its leaf nodes is taken as the input of LR model,so as to improve the learning ability of LR in nonlinear data features.At the same time,Bagging mechanism is introduced to disturb the row sampling parameters and column sampling parameters of XGBoost,and multiple XGBoost-LR fusion models are established to further improve the model prediction capability.In order to verify the effectiveness of the above two design methods,this paper uses the credit desensitization data set of an Internet finance company and several UCI data sets to carry out the experimental simulation of the above methods.At the same time,in order to reflect the practicability of the design method,this paper designs a credit risk control system.
Keywords/Search Tags:Dirichlet Process Mixture Model, Isolation Forest, Logistics Regression, XGBoost, Credit Risk Control
PDF Full Text Request
Related items