Font Size: a A A

Ensemble Variable Selection With Application To Personal Credit Scoring

Posted on:2019-06-23Degree:MasterType:Thesis
Country:ChinaCandidate:X H YangFull Text:PDF
GTID:2439330572994889Subject:Finance
Abstract/Summary:PDF Full Text Request
Credit granting is a fundalmental and very complex task in consumer financial institutions.Credit scoring is an important tool to serve this task.By analyzing huge historical data,credit scoring can help credit experts to evaluate customer's default risk accurately.Typically,credit scoring databases are often characterized by redundant and irrelevant variables.With all of this variables,credit scoring is not only time consuming but also cause "curse of the dimension"problem,the problems can be solved by variable selection.Credit experts utilize variable methods to boost prediction performance exclusively,and pay little attention to the stability of variable selection methods.The instability of variable selection methods means only small changes are made to the data will make a big changes.So the instability of variable selection methods will cause the risk indicators it selected unreliable and not validity,in the end harm the profi ts of the financial institutions.A stable variable selection methods can make a big deal to evaluate credit risk accurately.Ensemble learning methods ensemble multiple weak learners to make a strong learner,so the instability is defeated.Borrowing the ideas of the ensemble learning,we can also ensemble multiple variable selection methods to defeat the instability of variable selection methods.The article explains the reasons about the instability of variable selection methods firstly.Then,the theoretical framework and elements of the ensemble variable selection method are introduced in detail.However,the existing ensemble variable selection methods is mainly ensemble the same type variable selection method,they have same weakness,based on this,a new ensemble variable selection method based on two types of variable selection methods is designed.Pearson correlation coefficient,spearman correlation coefficient and median variable selection were selected iin the type of filtering method;The logistic regression and four variable selection methods based on random forest are selected in the type of embedded method,using it in a dataset from internet finance company,for the stability,the variance is close to 0 and the similarity value is close to 1.this means ensemble variable selection methods can boost stability significantly;for the predictability,the logistic regression based on ensemble varialble selection is superior to the logistic regression based all variables and elastic net logistic regression,this means improving the stability of variable selection does not sacrifice the predictability of the predict model,at the same time,the permutation test is significant,proving the prediction based on ensemble variable selection is robust.
Keywords/Search Tags:Variable Selection, Ensemble Learning, Stability, Credit Risk, Credit Scoring
PDF Full Text Request
Related items