Font Size: a A A

Research On Personal Credit Risk Model Based On Multiple Datasets

Posted on:2019-03-19Degree:MasterType:Thesis
Country:ChinaCandidate:M L ZhaoFull Text:PDF
GTID:2439330545497424Subject:Statistics
Abstract/Summary:PDF Full Text Request
Since the implementation of the financial system reform in 1993,the development of China's financial system has been greatly improved,and the consumption level and consumption concept of the Chinese people have been changing gradually.Credit business,especially personal consumer credit business has been greatly developed.Housing mortgage,auto loans and other credit business continues to expand,and credit card spending is also more and more popular.The development of the credit card business and the expansion of the scale provide the convenience for the users and bring the benefits to the commercial banks,but the credit risk also follows.There have been many studies on the personal credit risk assessment model,but no research based on multiple datasets.Due to the dual characteristics of urban and rural areas in China's economic development,there are certain differences in consumer attitudes,spending habits and spending power among urban and rural residents,which in turn have an impact on personal credit.This paper considers that the urban and rural credit data should be analyzed as multiple datasets.For such data,simply combining all the data for analysis may neglect the differences between datasets,and separating them may miss relevance.In this paper,a Logistic regression model based on multiple datasets was proposed.We use the Integrative Analysis method,taking Logistic regression as a loss function,adding Sign-based penalty on the basis of Composite MCP penalty,encouraging the symbol similarity between the common variables of data sets,building cMCPs model,and analyze personal credit data.The method used in this paper belongs to the bilevel variable selection method.The group coordinate descent method is used to solve the optimization problem,and the Accuracy,the True Positive Rate and the AUC value are used as the criteria for evaluating the prediction effect of the model.Compared with the separate MCP-Logistic model,and summarizing MCP-Logistic model,the cMCPs model is excellent.In the aspect of empirical analysis,we use the data of credit card department of a commercial.Summarizing MCP-Logistic regression model,separate MCP-Logistic regression model and cMCPs-Logistic regression model were used in the analysis.The empirical result shows that the cMCPs model has the highest Accuracy value,the TPR value and the AUC value,and the model is applicable to the actual problem.It also shows that it is reasonable to analyze urban and rural areas as multiple datasets.
Keywords/Search Tags:Multiple Datasets, Integrative Analysis, Credit Risk
PDF Full Text Request
Related items