Font Size: a A A

Prediction Of Provident Fund Loans Overdue Based On Ensemble Learning

Posted on:2022-05-17Degree:MasterType:Thesis
Country:ChinaCandidate:G R HuaFull Text:PDF
GTID:2518306314960519Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
In recent years,the overdue rate and overdue amount of provident fund loans have been rising continuously.Therefore,it is particularly important to accurately predict whether provident fund loan users will appear overdue repayment behavior.Regarding the overdue behavior evaluation of provident fund loan users as the research object,the paper predicts whether loan users will be overdue by establishing ensemble learning models,which provide a certain reference for the credit work of banks and other financial institutions.The paper uses the dataset in the field of provident fund loans from a bank.Firstly,it visualizes the correlation between some variables and the overdue rate,and secondly,it uses feature encoding and feature selection methods to perform data preprocessing.By constructing loan overdue prediction models based on XGBoost and random forest algorithm,the research found that:1)From the perspective of accuracy,the prediction effects of the two models are similar,and the information in the dataset can be fully mined.The prediction accuracy of the two models is high,so they have good applicability in the field of overdue prediction of provident fund loans.2)By comparing parameters adjustment process of the two models,it's found that compared with the XGBoost model,the random forest requires fewer parameters to be adjusted,and the adjustment process is relatively simple.3)There is an inverse relationship between the recall and the threshold.In other words,the smaller the threshold,the higher the recall,so we can improve the recall by adjusting the threshold.4)The recall of two models is both not high.In this paper,an improved random forest model is proposed to solve the problem of poor recall.In the process of establishing the random forest model,some decision trees with lower recall will be produced,and these decision trees will eventually reduce the recall of the random forest model.The improved random forest model introduces the recall into the modeling process,and improves the performance of the model by eliminating the decision tree with poor recall.In the end,from the perspectives of accuracy,recall and F1_socre,the improved random forest model is compared with the XGBoost model and the random forest model.The comparison result is that the recall and F1_socre of the improved random forest model are significantly improved while keeping the accuracy of the original model basically unchanged.
Keywords/Search Tags:Provident fund loan, Random forest, XGBoost, Recall
PDF Full Text Request
Related items