Font Size: a A A

Research On Credit Risk Prediction Method Based On Big Data Financial Cloud Platform

Posted on:2018-04-23Degree:MasterType:Thesis
Country:ChinaCandidate:J J ZhaoFull Text:PDF
GTID:2359330536984870Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the increase in the amount of credit card issuers,the rise of a variety of Internet consumer loans,customer credit risk prediction has been an important direction of the financial industry.Under the background of big data Era,To establish a unified data platform for the integration of multi data information,multi dimensional modeling of the data on the platform,the use of spark technology to deal with the data processing,At the same time,combined with the most popular xgboost algorithm framework of machine learning in recent years,the prediction model of overdue credit risk is constructed.In order to adapt to the imbalance of the overdue record data,based on the xgboost algorithm,integrated into the idea of cost sensitive learning,the cs-xgboost algorithm is introduced to solve the classification problem on the imbalanced data sets.The main research work of this paper is as follows:(1)Through the investigation and analysis of the mature technology of big data applications in the Internet industry,make sure to use Hadoop framework based on the spark computing framework as the experimental environment,and finally use Ambari to create,manage,monitor Hadoop clusters.(2)For data integration module design,combined with the traditional data warehouse modeling method based on the business needs of hive on multi-source data hierarchical modeling,at the same time,choose MOAI as the big data platform ETL scheduling,and then complete the design of data integration module.(3)Aiming at the problem of two classification on imbalanced data sets,A cs-xgboost algorithm based on cost sensitive and xgboost algorithm framework is proposed,the classification performance of cs-xgboost algorithm is verified on the open data set Data Hackathon 3.x AV.Using RFE,feature importance measurement based on random forest,and other feature selection methods to sort the features.On the basis of feature selection,random forest and cs-xgboost were used to predict the risk of overdue credit,and the performance of the algorithm was compared between the training set and the test set.
Keywords/Search Tags:overdue risk prediction, cs-xgboost, cost sensitive, imbalance
PDF Full Text Request
Related items