Font Size: a A A

Financial Risk Recognition Model And Its Application Based On Classical Scoring Card And Machine Learning

Posted on:2020-11-16Degree:MasterType:Thesis
Country:ChinaCandidate:J Y BaiFull Text:PDF
GTID:2439330575952046Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
The booming development of Internet finance provides faster financial services for customers with financing needs,and also causes problems such as credit risk and user fraud.Therefore,through effective credit scoring system and quantitative analysis model to quantitatively predict and Controlling risk is a hot issue in the current risk field research.In the context of big data,this paper uses logistic regression to measure the shortcomings of risk.Based on the online cash loan data of a consumer finance company in 2017 and 2018,GBDT and XGBoost are used.LightGBM three machine learning algorithms establish a model,compare and analyze the model established by logistic regression,and build a Stacking model based on this,expecting to achieve better model effect.First of all,the sample data volume of this paper is 22925,and the variable is 1327,including information on commodity consumption,stability,online shopping,borrowing intention(multiple applications),real name information and address information,and deletes the missing value by 95% and A single value accounted for more than 95% of the variable's variable,the variable was WOE binned,the iv value was less than 0.02,and the categorical variable was treated with dummy variables to obtain 843 variables.Secondly,the 843 variables were subjected to lasso regression,stepwise regression and variable derivation,and 87 important variables were selected and modeled.Then,logistic regression and GBDT,XGBoost,LightGBM three machine learning algorithms are selected to construct the credit scoring system respectively.According to the two indicators of AUC and KS,the effects of each model are compared and analyzed.The results show that the three basic models of GBDT,XGBoost and LightGBM The model effect is better than that of the single model logistic regression.The LightGBM model has the best effect.Based on this,GBDT,XGBoost and LightGBM are used as the first base model,and the logistic regression is the second model.To this end,we expect to achieve a higher model effect,and the AUC and KS values of the Stacking combination model are the highest among the remaining models.The KS value is improved by nearly 6% compared with the logistic regression,and GBDT,XGBoost and LightGBM are improved by nearly 1%.Finally,based on the prediction results of logistic regression,a scorecard is established.By comparing and analyzing the user's credit score and the actual default rate,the reliability of the result is verified,and some constructive opinions are given through the distribution of credit scores and default rates.The latter of GBDT,XGBoost and LightGBM have some improvements to the former,so this paper uses these three algorithms to build the model,and has verified its advantages in terms of model effects,over-fitting problems and running speed,etc.On this basis,combined with logistic regression,the Stacking model is established,which has certain innovation.
Keywords/Search Tags:Internet Finance, Anti-fraud, Logistic Regression, Machine Learning, Scorecard
PDF Full Text Request
Related items