Research On Credit Risk Control On Machine Learning

Posted on:2021-04-05

Degree:Master

Type:Thesis

Country:China

Candidate:Z Y Liu

Full Text:PDF

GTID:2428330614965709

Subject:computer technology

Abstract/Summary:

PDF Full Text Request

With the popularization of the concept of �Internet+�,China's Internet finance industry is developing rapidly and the market share of personal credit business is also growing rapidly,which makes the business data become complex and diverse.Traditional credit risk control is mostly modeldriven strategy,which can no longer meet the demand of default risk prediction,resulting in frequent occurrence of various default events and large losses to institutions.Therefore,it is necessary to introduce machine learning algorithm to improve the credit risk control mechanism and promote the healthy and sustainable development of the credit business market.This paper uses machine learning algorithm to solve two problems in credit risk control scenarios.First,in the initial stage of new credit products,there is no business accumulation and only a small amount of marked data and a large amount of unmarked data,so it is impossible to establish a datadriven supervised credit risk control model.Second,after the launch of credit products for a period of time,a certain amount of data has been accumulated,and most institutions will use Logistics Regression(LR)to realize credit risk control modeling.LR model is simple,easy to implement and fast to train.However,this model is a linear model with limited learning ability,so it cannot learn the non-linear relationship between features.It needs experienced risk control engineers in credit business to do the artificial feature combination,so a large amount of labor costs is needed.About the above problems,the main work of this paper is as follows:(1)In view of the problem of not being able to build a data-driven supervised credit risk control model at the initial stage of credit product launch,this paper proposes a cold start method based on dirichlet process mixture model(DPMM)and isolation forest(IForest).In this method,DPMM was used to calculate the default similarity of unmarked samples,and IForest was used to calculate the default anomaly of unmarked samples.Based on the combination of default similarity and default anomaly,reliable normal samples and potential default samples were screened out,so as to provide sufficient samples for the follow-up supervision model training.(2)In order to solve the problem that the single LR model is not able to learn the nonlinear relationship between data features at the late stage of credit product launch,this paper proposes the integration method of XGBoost-LR model based on Bagging.In this method,e Xtreme Gradient Boosting(XGBoost)is adopted for feature transformation,and the output of its leaf nodes is taken as the input of LR model,so as to improve the learning ability of LR in nonlinear data features.At the same time,Bagging mechanism is introduced to disturb the row sampling parameters and column sampling parameters of XGBoost,and multiple XGBoost-LR fusion models are established to further improve the model prediction capability.In order to verify the effectiveness of the above two design methods,this paper uses the credit desensitization data set of an Internet finance company and several UCI data sets to carry out the experimental simulation of the above methods.At the same time,in order to reflect the practicability of the design method,this paper designs a credit risk control system.

Keywords/Search Tags:

Dirichlet Process Mixture Model, Isolation Forest, Logistics Regression, XGBoost, Credit Risk Control

PDF Full Text Request

Related items

1	Application Of Data Mining In Personal Credit Risk Identification Of P2P Online Loan
2	XGBoost-based Online Loan Risk Prediction
3	Study On The Risk Prediction Model Of User Loan Based On Machine Learning
4	Research On Image Segmentation Based On Dirichlet Process Mixture Model
5	Review Clustering Using Dirichlet Process Multinomial Mixture Models
6	An XGBoost-Based Ensemble Learning Approach To Personal Credit Risk Assessment
7	Research On Default Risk Identification Of Online Loan Based On Machine Learning Hybirdmodel
8	Research On Credit Scoring Model Based On Machine Learning
9	Research On Pre-loan Risk Control Of Consumer Finance Based On Machine Learning
10	Research On Credit Risk Prediction Model Based On Machine Learning Technology