Font Size: a A A

Research And Application Of Credit Risk Evaluation Model Based On Dimensionality Reduction Method

Posted on:2022-07-16Degree:MasterType:Thesis
Country:ChinaCandidate:Z H LiuFull Text:PDF
GTID:2480306482968999Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
In the era of big data,when dealing with practical problems of machine learning,common dimensionality reduction methods are principal component analysis(PCA)based on feature extraction and direct application of Lasso family methods based on feature selection.There are probably two reasons: one is big data raw materials are often massive,noisy,and disorganized.Usually the group structure is not a priori and difficult to distinguish.In the case of little improvement in the algorithm results,according to the Occam's razor principle,a simple model should be selected as much as possible.Secondly,big data are mostly generated in the Internet field,and Internet companies often do not lack computing power.Their pursuit is simplicity,efficiency,and fast iteration.Therefore,simple variable screening algorithms are easier to implement.So,in actual machine learning tasks,for data sets of different scales and different data structure characteristics,which dimensionality reduction method is more "cost-effective" ? At present,it basically relies on the modeler's experience and personal preference for the dimensionality reduction algorithm,and there is even no general empirical criterion that is widely accepted for the time being.In order to better understand and solve the dimensionality reduction method selection problem in the use of classification algorithms,especially the credit risk discrimination problem based on financial data such as credit data as the background,this research considers comparing the typical reduction of data sets with different data structures.The combined model effect of dimensional method and classification algorithm gives empirical guidelines.This article starts with systematically reviews related research of credit risk from two dimensions,i.e.corporate credit risk and personal credit risk.Besides,we analyze the commonly used models and methods for variable screening in the field of credit risk research,and clarifies that the essence of credit risk research is based on the credit financial data scenario from the perspective of machine learning.Afterwards,under the simulation data set of five data structures,i.e.n<<p,n<p,n=p,n>p,n>>p,the two dimensionality reduction algorithms of PCA and Lasso-Logistic are compared with K nearest neighbors and logic.The combined application effects of classical classification algorithms viz.regression,naive Bayes,support vector machines,decision trees,random forests were simulated by Monte Carlo.The study found that from all three perspectives of the number of features after dimensionality,quality,and the amount of algorithm calculations,Lasso-Logistic dimensionality reduction is better than PCA dimensionality reduction.In the machine learning classification task,the Logistic algorithm with the Lasso family penalty item is firstly used to filter the entered variables,and the data is dimension-reduced and then important features are used for classification.Following this method,we usually get better classification performance.Third,based on the empirical analysis of credit card customer overdue data from a credit institution,we explore the main influencing factors and impacts of personal credit default,and builds a related model to verify the Monte Carlo simulation results.The empirical results generally support the Monte Carlo simulation results.Besides,we found that the factors that have a greater impact on credit default are mainly the customer's credit history behavior data and personal background information.Credit history behavior data focuses on variables such as historical repayment records and recent payments,while personal background information focuses more on educational background and marital status,age and other information.However,the influence of gender and other factors is not significant.
Keywords/Search Tags:Credit risk assessment, Dimensionality reduction, PCA, Lasso-Logistic, Classification
PDF Full Text Request
Related items