Font Size: a A A

Prediction Of Overdue Provident Fund Loans Based On Machine Learning Algorithms

Posted on:2022-05-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q ZhangFull Text:PDF
GTID:2517306527952439Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
With the development of the country's economy and the continuous improvement of the provident fund system,the demand for people to use provident fund loans to buy houses is increasing,and the related credit business of banks has also developed rapidly.Provident fund loans are related to major livelihood issues.To maintain its healthy and stable development,it is necessary to face up to the potential risks in the operation of the system and actively respond.One of the important aspects is the ability to effectively evaluate the qualifications of loan clients.Therefore,there is an urgent need to establish an efficient and accurate model to predict the overdue risk of client's loans.And the model can provide a reference for the bank's credit evaluation to realize risk aversion.This paper mainly uses the real desensitization data of a bank in Shandong Province as an example to establish a predictive model for overdue provident fund loans.First,perform data cleaning and preprocessing on the original data set,and then eliminate strongly correlated variables through correlation analysis.Combine features based on actual business conditions,data binning technology and statistics features.Then expand the original data set to obtain multiple new features.The xgboost tree is used for feature selection,so the newly generated redundant features are eliminated.Then standardize the data and perform one-hot encoding to prepare for subsequent modeling.Due to the imbalance of the data set used in this paper,the traditional classifier will fail.So select the optimal data sampling ratio and method for multiple models to achieve data balance.In terms of model selection,this paper adopts a single model including Logistic regression,support vector machine,and integrated learning model Random Forest and Light GBM,and establishes a two-stage combined model of Logistic regression+Random Forest.Model evaluation based on2(80),AUC value and MCC.The best model is selected to predict overdue behavior.The study found that feature combination and data balance can effectively improve the classification performance of each model.Logistic regression in a single model shows higher prediction accuracy for minority samples,but the overall prediction ability is general.Support vector machine model has average performance.Random Forest and Light GBM have better performance for different categories.The combined model combines the advantages of two basic single models.Finally,this paper chooses the random forest model for loan overdue prediction,and gives the characteristics of overdue users for reference.
Keywords/Search Tags:Overdue loan, Unbalanced data set, Logistic, Integrated learning model, Combined model
PDF Full Text Request
Related items