Font Size: a A A

Analysis Of Internet Finance Credit Risk Based On Dask

Posted on:2024-06-11Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y YangFull Text:PDF
GTID:2557307079991529Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
The rapid development of Internet financial credit brings convenience to the majority of users.At the same time,the number of defaulting customers of Internet financial credit is also increasing,causing huge losses to Internet financial institutions.This research establishes a customer default prediction model based on the Dask big data platform with the customer information of the Lending club company platform,which solves the problem of running too long on a single machine and improves the identification rate of default customers.Firstly,the data is preprocessed,such as missing value filling,dummy variables of discrete variables,feature binning of continuous variables,and feature selected by Random Forest algorithm.Default customers in this data set only account for 7.6%,which is an unbalanced data set,so the SMOTETomek Links algorithm combining is used to balance the data set.Finally,the random forest,gradient boosting tree,Light GBM model,Cat Boost model and CB_LGBM fusion model are respectively combined with the Dask big data distributed computing framework for modeling training and prediction.The Dask-based CB_LGBM model has the best effect in identifying defaulting customers,the accuracy rate reached 96.14%.the recall rate reached 92.17%.the AUC value was 0.9593.the model generalization ability was good.And the model took 107.83 minutes.The running speed was fast.The CB_LGBM model reduces the time-consuming identification of defaulting customers and improves the recognition rate of identifying defaulting customers in Internet financial platforms.The CB_LGBM model based on the Dask platform is more suitable for the data set in this this research.
Keywords/Search Tags:Credit Risk, Big Data, Machine Learning, CB_LGBM, Dask
PDF Full Text Request
Related items