Font Size: a A A

Research On Credit Risk Prediction Of Internet Finance Under The Background Of Big Data

Posted on:2022-05-12Degree:MasterType:Thesis
Country:ChinaCandidate:M H DuFull Text:PDF
GTID:2517306566990099Subject:Statistics
Abstract/Summary:PDF Full Text Request
As a new financial business model connected with Internet technology,Internet finance can solve the problem of financing difficulties for SMEs to a certain extent.With the continuous development of Internet technology,the data sources of Internet finance are more extensive,and the scale of data is getting larger and larger.Compared with traditional financial lending,the huge amount of data makes the prediction of Internet financial credit risk more difficult.If the method of predicting traditional financial credit risk is directly used to predict the Internet financial credit risk,the computer requirements are higher and the realization is more difficult.Therefore,this thesis focuses on the prediction of Internet financial credit risk in the context of big data.This thesis combines two kinds of two-step subsampling algorithm with the logistic regression model,and analyzes the effect of Internet financial credit risk prediction through the results of numerical simulation and empirical analysis.In the numerical simulation data set,the thesis analyzes the prediction effect of the logistic regression model based on the two-step subsampling algorithm and the logistic regression model based on the simple random sampling method.By comparing the accuracy rate,precision rate,recall rate and F1value,the former is better than the latter.The prediction effect of the two kinds of two-step subsampling algorithms is less different.The dimensions of Internet credit data are uncertain.If the data set has a large number of independent variables,the model needs to be established after screening the important variables,otherwise the model can be established directly.This thesis makes a detailed analysis for both cases.The first case is when there are a large number of independent variables.Take the network lending data set as an example to compare the prediction effects of the random forest-logistic model and the Lasso-logistic model on the network lending credit risk,and select the optimal model to combine with the sampling algorithm.Then compare the accuracy and time cost of the sampling algorithm.Among them,the random forest-logistic model refers to the establishment of a logistic model using important variables selected by the random forest.The second case is when the number of independent variables is small,taking the credit card fraud data set as an example,combining the logistic regression model with the two-step subsampling algorithm to compare the accuracy and time cost between the sampling algorithms.The empirical results show that the random forest-logistic model and the Lasso-logistic model in the first case have very little difference in predicting the credit risk of online lending.This thesis chooses the random forest-logistic model as an example,combining the model with the two-step subsampling algorithm,the accuracy and precision of the two-step subsampling algorithm are higher than the simple random sampling method,which shows that the two-step subsampling algorithm has better predictive effect.The two kinds of two-step subsampling algorithms have close prediction effect.Compared with the full sample,the two-step subsampling algorithm can greatly save time.In the second case,the logistic regression model based on the two-step subsampling algorithm has higher accuracy and smaller mean square error than the logistic regression model based on the simple random sampling method.This shows that the logistic regression model based on the two-step subsampling algorithm can better predict the risk of credit card fraud.The two kinds of two-step subsampling algorithms have close prediction effect.Compared with the logistic regression model under the full sample,the logistic regression model based on the two-step subsampling algorithm maintains higher accuracy and greatly saves time.Therefore,the regression model based on the two-step subsampling algorithm can make good predictions of Internet financial credit risk in both high-dimensional and low-dimensional Internet financial credit data.
Keywords/Search Tags:Internet financial credit risk, Big data sampling, Random forest-logistic model, Lasso-logistic model, Two-step subsampling algorithm
PDF Full Text Request
Related items