Font Size: a A A

Research And Prototype Implementation Of Financial Credit Evaluation Parallel Learning Model Support Technology

Posted on:2021-01-16Degree:MasterType:Thesis
Country:ChinaCandidate:Z X LiFull Text:PDF
GTID:2428330620464050Subject:Engineering
Abstract/Summary:PDF Full Text Request
Financial security is a very important part of national security.With the rapid development of Internet finance and smart finance,the risks and challenges faced by financial services are increasing.The advent of the big data era has made machine learning a very important financial risk control tool.The increase in the magnitude of data has brought both challenges and opportunities to traditional machine learning.In the Internet finance industry,traditional machine learning technologies for small data have been difficult to meet the needs of the big data era.By parallelizing machine learning algorithms,improving the response speed of data processing on the basis of ensuring the accuracy of calculations,it is a key issue in the field of intelligent financial algorithms.In the context of financial security issues,this thesis studies the credit evaluation problem in the filed of financial risk control,uses machine learning support technology,the credit scoring model,as the research target,and selects four mainstream algorithms in the credit scoring model,including logistic regression,GBDT,XGBoost,and LightGBM models.Starting from parallel machine learning,according to the characteristics of the Internet financial credit business industry,a public online credit data set is used for modeling research analysis.This thesis first builds a lifecycle service management platform for machine learning models with Hadoop cluster as the underlying support based on the Zeppelin framework.And to process dataset and modeling learning on the platform.Based on the platform,the obtained open source loan data sets from 2007 to 2015 are tested for data quality,data cleaning,feature construction,and feature selection.In addition,follow-up research modeling and model deployment will be performed on the platform.In this thesis,considering the characteristics of the credit data set and synthesizing the characteristics of the selected model,based on the data parallel theory,two parallel mechanisms relying on the parallel python module cluster mode calculation are designed.They both use the voting method and the stacking method as model fusion strategies.One of them uses multiple undersampling methods to deal with the imbalance problem of data set categories,and the other uses the EasyEnsemble algorithm idea for this problem,which adds further model learning to the residuals.As a control experiment,the SMOTE algorithm is used to deal with category imbalance in the modeling process of each single model.Based on the parallel mechanism designed in this thesis,GBDT and XGBoost,logistic regression,and LightGBM are substituted for verification,and corresponding single model comparison experiments are performed.Through cross-validation,AUC,KS value,and PSI are used as model evaluation indicators.The experimental results prove that the machine learning parallel model mechanisms proposed in this thesis based on the data parallel theory all improve the response speed of data processing on the basis of ensuring the accuracy of the calculation.Among them,the acceleration effect of the model constructed by using GBDT,XGBoost,LightGBM as the model method in the parallel mechanism is very obvious.
Keywords/Search Tags:Credit evaluation, Data parallel, GBDT, XGBoost, LightGBM
PDF Full Text Request
Related items