Font Size: a A A

Research On Machine Learning Algorithm For Credit Risk Assessment In Consumer Finance

Posted on:2022-07-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y Z KangFull Text:PDF
GTID:1528307154967219Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
Consumer finance is one of the important tools to stimulate residents’ consumption potential.As an important part of the inclusive finance in China,the development of consumer finance can improve the living standards of Chinese people as well as upgrade the mode of economic growth.It is also pretty necessary for China to build a modern society.In recent years,a large number of consumer finance,microfinance and financial technology companies have risen rapidly,providing consumers with a variety of loan products.However,although the consumer finance industry owns a rapid development,due to the complexity of consumer population and the lack of accurate and efficient risk assessment methods,it has also accumulated large financial risks.Credit default events occur from time to time,and the overall default rates of the industry are increasing rapidly.Risk management is vital to the sustainable development of consumer finance industry.With the promulgation of the national fintech programme,most institutions began to layout big data and machine learning techniques for credit risk assessment,striving to improve the efficiency of loan approval and achieve intelligent risk control and scientific credit management.However,with the continuous deepening of the application of the internet and big data technology,the explosive growth of massive credit data also makes credit scoring modelling be faced with many new challenges.First,considering loan default depends on a variety of uncertain factors,and the massive data has diverse dimensions,complex relationships and difficult to guarantee the authenticity,it takes a lot of time to spend on data preprocessing and the discovery of important risk features.In the real developments of credit scoring models,the whole feature engineering process is complex and inefficient.Second,the data has missing labels,so it is difficult for the model to learn the statistical characteristics of rejected samples.Third,the class distribution of data samples is imbalanced,which affects the parameter estimation of the prediction model.In reality,it is the data that really determines the upper limit of the prediction ability of the credit scoring model.These original problems caused by the data limit the effective application of machine learning algorithm in the field of consumer financial credit risk assessment to a certain extent.Based on the summary of industry practical experience and existing research literature,the machine learning algorithm for consumer finance credit risk assessment is studied in this thesis from the perspective of data.The studies of this thesis includes three aspects:(1)Aiming at the problem of complex and inefficient feature engineering approaches in the developments of credit scoring models,the feature engineering method for structured data in consumer finance is studied.Firstly,based on the summary and refinement of the existing unsystematic feature engineering methods,a feature engineering technical framework for structured data in consumer finance is proposed.The technical framework includes five parts: data preprocessing,feature construction,feature extraction,feature selection and feature monitoring.Secondly,aiming at the key feature construction section in this framework,an automatic feature construction algorithm for structured data(AFCA)is proposed,which solves the problem of automatic feature construction and overcomes the difficulty of manual feature construction.Thirdly,for the feature selection section in this framework,based on abundant practical experience,an ensemble-learningbased hybrid feature selection algorithm(EHFS)is proposed to achieve the rapid selection of effective features.Experimental results proved that these two methods improve the predictive ability of models on the real credit data set.This study improves the efficiency of the development of credit scoring model,provides a fundamental technology for the development of automatic feature engineering application in the practice of intelligent risk control,and provides underlying technical support for data asset management in the consumer finance industry.(2)Aiming at the problems of sample selection bias and imbalanced class distribution in massive credit data,A graph-based semi-supervised reject inference framework considering imbalanced data distribution for credit scoring is proposed.This method solves the reject inference problem under the condition of imbalance distribution by integrating the Borderline-SMOTE algorithm and the label spreading algorithm based on Mahalanobis distance.By introducing tree-based ensemble learning models such as XGBoost and Light GBM,a multi-model framework for credit scoring is constructed.Through detailed experimental evaluation,it is proved that the prediction performance of this method performs better than many traditional models.This study not only provides a new direction for academic researches of credit risk management,but also provides theoretical guidance for the design of practical intelligent risk control applications for consumer finance in practice.(3)Aiming at the problems of complex feature engineering,sample selection bias and imbalanced distribution in high-dimensional credit data,A CWGAN-GPbased multi-task learning model for credit risk assessment is studied.This method learns the statistical distribution of the whole loan customer population through a Conditional GAN with gradient penalty term based on Wasserstein distance,and then oversampling the accepted samples.Then,the deep multi-task learning model is used to train the rebalanced accepted samples and rejected samples at the same time,which solves the using problem of rejected samples in credit data,and provides a new idea for solving the problem of reject inference.In addition,by introducing word-embedding technology,the efficient feature extraction of original features is achieved,and the complexity of feature engineering is reduced.It is proved that the proposed model improves the ability to predict the borrower’s credit risk effectively from strict experiments.This study not only enriches the research on credit risk assessment methods,but also creates opportunities for the development of more systematic and intelligent risk control applications.Aiming at the unique scientific problems of consumer finance in Fin Tech era,based on the methodologies of machine learning,this thesis studies a variety of credit risk assessment theories and methods,dealing with some vital problems in consumer credit scoring practice as well as enriches the methods of credit risk management.Based on these studies,this thesis has notable theoretical and practical value for credit risk management in the consumer finance industry.
Keywords/Search Tags:Consumer finance, Credit scoring, Feature engineering, Reject inference, Imbalanced learning, Generative adversarial network, Multi-task learning
PDF Full Text Request
Related items