Font Size: a A A

Research On Credit Risk Control Model And Algorithm Based On Machine Learning

Posted on:2023-05-11Degree:MasterType:Thesis
Country:ChinaCandidate:Y D ZhuFull Text:PDF
GTID:2568306791454534Subject:Optical engineering
Abstract/Summary:PDF Full Text Request
With the booming development of the global economy,China’s constantly opening to the outside world has been raised,with its market economic system increasingly improved.However,the financial industry is also facing unprecedented challenges.For example,it is increasingly difficult to expand the bank loan business,and the traditional commercial banks’ business models relying on net interest margins are unsustainable.In order to obtain new profit growth points,commercial banks gradually move to the retail operation with such advantages as stable profitability,massive customers,and low capital occupied by intermediary business.As a result,retail business has become an important part in the growth of banking operation.As the core of retail business,personal loan is omnipresent.Whether it is for people’s life,commodity trading market,bank loaning or investment in the capital market,the credit business is indispensible,because it is the foundation of the market economy,with the characteristics of a large customer base,scattered customers,and a balance between risk and income.Therefore,the requirements for the accuracy of loan risk control are getting stricter,early warning against potential defaults is needed,and the credit reporting system shall be improved.This has gradually become the consensus of the financial industry.To measure the personal credit defaults of financial institutions,this thesis adopts personal credit loan datasets published in domestic competitions as well as the borrower dataset of a wellknown P2 P platform,with Python used as a tool for data analytics and model construction.This study summarizes the modeling experience and combines the professional knowledge on financial risk control,starting from such stages as data preprocessing,data analysis,feature screening,dataset division,construction of machine learning models,and construction and effectiveness evaluation of evaluation systems.This thesis proposes meaningful methods and opinions for financial institutions.The main research contents of this thesis are as follows:(1)Data pre-processing and feature engineering.Research and analysis are conducted based on data from personal credit loan which published in domestic competitions.Firstly,the raw data is pre-processed,including the analysis and filling of missing value,the conversion of feature type,feature encoding,normalization processing,etc.In addition,feature derivation and feature selection are performed,relying on the IV value and Pearson correlation coefficient of the information method for the initial screening of features,followed by a further screening of features with Light GBM importance.Finally,it addresses the imbalance of real data categories by using the SMOTE algorithm for the amplification of samples.To be specific,the SMOTE algorithm can balance the two types of sample data by generating fitted data for a smaller number of defaulted samples,making the number of defaulted samples relatively balanced with the number of normal samples.(2)Construction of loan risk control model.The thesis performs five different machine learning algorithms,including Logistic Regression,Naive Bayes,Decision Tree,XGBoost and Multilayer Perceptron,and using grid search method to adjust parameters,the basic model of personal credit loan is constructed.Combined with Stacking and Voting methods,a variety of basic models are combined to explore the prediction results of risk control models under different combinations.Based on the combination of optimal performance,Voting-LR algorithm is proposed.On the one hand,the algorithm first acts as a feature processor to construct new features,On the other hand,it uses the obtained predicted probability value as a new feature input to train the loan risk control model.(3)Empirical research on loan risk control model.This thesis selects the accuracy rate,precision rate,recall rate,KS value and AUC value evaluation indicators to evaluate the effect of the model.The experiment result shows that among the five basic personal credit loan models,the XGBoost algorithm has the best prediction effect,which is better than other machine learning algorithms.Both fusion methods can improve the accuracy of the basic model to a certain extent.On the whole,the performance of Voting fusion is better than Stacking.Under the Stacking fusion method,the best fusion model consist of XGBoost as the input layer and the LR as the output layer.In Voting fusion,the combination of the best fusion models is consist of MLP,XGB and LR.The performance of the Voting-LR algorithm is better than all the combinations of the Stacking fusion method and the Voting fusion method,and the structure is simple,and it also performs well in interpretability.Finally,the Voting-LR algorithm is applied to the credit risk control system of P2 P online loaner,which effectively improves the accuracy of default prediction of P2 P online loaner.Machine learning algorithms and fusion models are adopted in this thesis to predict and study the real personal loan risk on the loan platforms,which can effectively predict the possibility of borrowers’ default.It has application value in the credit risk control of financial institutions,which can effectively elevate the predictive ability of risk control system of financial institutions,and promote the healthy development of domestic personal credit markets.
Keywords/Search Tags:Personal Loan, Default Prediction, Feature Selection, Logistic Regression, Model Fusion
PDF Full Text Request
Related items