Font Size: a A A

Application Of Data Mining In Personal Credit Risk Identification Of P2P Online Loan

Posted on:2021-01-19Degree:MasterType:Thesis
Country:ChinaCandidate:M F ZhangFull Text:PDF
GTID:2518306311996069Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
As the computer technology advances,the concept of "Internet+" has spread into various industries,and the combination of the Internet and the financial industry results in Internet finance.Internet finance has been greatly developed since it was introduced into China,and innovation in financial products has brought great convenience to our lives.In order to solve the financial problems faced by personal and small business,online platforms are established to flexibly collect spare money from individuals to help those who are short of funds.The P2P online platforms,due to its low threshold,low cost and high income,hane gone through entered a rapid development stage since 2013.However,becaues of the inadequate regulatory system in China,many P2P online loan companies have poor anti-risk capabilities,which leads to various risks in the P2P industry,such as illegal fundraising of platforms,violent collection of funds,and malicious fraud of borrowers.All of these produce negative impact on the social economy.Therefore,risk control is crucial for the P2P industry.For P2P online loan companies,controlling external risks requires improvement of the market supervision system,banning platforms with insufficient qualifications,and ensuring the legal and compliant operation of the industry;the requirments of controlling internal risks include establishing a strict system to control the custermers' credit risk,reducing the non-platforming loan ratio,avoiding the loss of investors,and improving the efficiency of companies.Therefore,it is necessary for P2P platforms to evaluate personal credit risk.In order to assess the size of personal credit risk,each P2P online loan company will establish its own risk control system.The essence of establishing a risk control system is to predict the likely size of a borrower's default,and then determine whether the borrower will default.From a theoretical perspective,this is a classification problem.At present,commonly used discrimination methods are logistic regression,random forest,XGBoost,and deep learning.In this paper,three models including penalty logistic regression,random forest and XGBoost,are used to explore the data set.Among them,the logic regression algorithm has a strong interpretability and good stability,but the prediction effect is not ideal;compared with the logic regression,the prediction effect of random forest and XGBoost on the unbalanced data is greatly improved,and it can also output the importance of prediction variables,and XGBoost model has better prediction effect.The data set used in this article contains many predictive variables,and some variables will have correlations,and some variables have a little impact on the dependent variable.This results in redundant information in the data set,which will affect the model's prediction accuracy.Because XGBoost,random forest and penalty logistic regression can select predictive variables that have an important impact on the dependent variable and play a role in reducing the dimension,this paper mainly builds a personal credit risk assessment model by using XGBoost model,random forest model and logistic regression model.This paper first establishes a single model,that is,a logistic regression model,a random forest model and a XGBoost model.By comparing the single model,it is found that the prediction effect of the XGBoost is best.Then according to the importance of the predictive variables given by the XGBoost model,select the categorical variable that has a large impact on loan defaults to build a categorical predictive model,that is,establish 7 best XGBoost models for 7 grades.The classification prediction model provides a new reference for the personal credit risk identification model.Finally,based on the advantages and disadvantages of a single model,this article establishes a combination model of logistic regression and XGBoost,which ultimately improves the accuracy of prediction.In order to identify loan customers with higher risk of default,this paper uses data mining technology to establish a personal credit risk assessment model and finds that the amount of borrowing,borrowing interest rates,personal credit ratings,income levels,etc.have a large impact on personal credit risk;XGBoost has the best prediction effect in a single model,but the logistic regression model is more stable and interpretable.The forecasting effect of the combined model is better than that of the single model.It can be used for more rigorous and accurate assessment of personal credit risk,which is beneficial to reducing the default rate of loans,to improve the risk prevention and control capabilities of enterprises,ensure the integrity of the company's capital chain,improve the profitability of enterprises,and promote the efficient and rapid development of P2P Internet finance.
Keywords/Search Tags:penalty logistic regression, random forest, XGBoost, P2P Personal Credit Risk
PDF Full Text Request
Related items