Sampling issues in credit scoring: Missing data, reject inference and treatment effects

Posted on:2006-01-18

Degree:Ph.D

Type:Thesis

University:University of Waterloo (Canada)

Candidate:Chen, Gongyue

Full Text:PDF

GTID:2459390008465367

Subject:Economics

Abstract/Summary:

PDF Full Text Request

Credit scoring represents statistically sophisticated and empirically validated prediction models to assess obligors' credit risk. The objective of building a prediction model in credit scoring is straightforward: the model must be as accurate as possible so that it can be used to evaluate many credit risk objects. However, researchers have frequently been deterred by the fact that it is difficult to obtain representative data due to many sampling problems. In this dissertation I focus on three types of sampling issues for the purpose of building prediction models in credit scoring. They are problems associated with missing data, sample selection bias and treatment effects.; I investigate problems associated with imputing missing data for categorical survey questions when data are sparse. Categorical survey questions are commonly used to collect data for credit scoring. I provide a Bayesian theoretical framework to impute missing categorical data under the assumption of missing at random, and suggest approaches to improve the model estimation using Markov Chain Monte Carlo and multiple-imputation. I further test the efficiency of this method using simulation. A literature review shows that missing data imputation for categorical survey questions has not been well addressed. The method I recommend has not previously been applied in the research field of credit scoring. My research shows that this method is as efficient as other traditional methods, and therefore worthy of further analysis and application.; In credit scoring, sample selection bias is commonly referred to as "reject inference", where partial observations of dependent variables are missing due to a deliberate credit rejection process. A literature review leads to the conclusion that most solutions currently proposed for reject inference are not fully validated, which may partly be caused by the reason of commercial protection. On the other hand, these solutions are usually proper only when some restrictive underlying assumptions are valid. Sometimes, these assumptions are crucial for model performance. In this section of the thesis, I first use data with complete information on both rejected and accepted bank loan applicants to estimate the value of sample bias correction using Heckman's two-stage model with partial observability and a traditional augmentation method, and show that these approaches are only efficient under some restrictive conditions. Then, I suggest that reject inference could be mapped to missing data mechanisms, and propose a solution by using a Bayesian imputation method under the assumption of missing not at random. Under the popular maximum likelihood framework, I suggest using an adjusted logit model for reject inference. Its strategy is to replace the missing data of the dependent variable with its conditional expectation, given the observed data. The above solutions do not require the otherwise common assumption that the proportion of good accounts is the same for the rejects as for the accepted. Tests show that these solutions are most likely to increase the accuracy of prediction models when data are missing not at random.; In certain situations, a risk assessment may have a treatment effect that leads to a self-fulfilling prophecy of the assessment. For example, granting a loan to an applicant that is observationally identical to a non-applicant is likely to cause the applicant to perform better. A prediction model based on a sample of a treated group (e.g. accepted applicants) may therefore be biased. I first evaluate a maximum likelihood solution and discuss its drawbacks. Then, I provide a benchmark evaluation model to assess the treatment effect for a specific problem. Quantitative measurement of treatment effects is difficult and rare in the credit scoring domain; therefore my research in this area sheds some light on how this sample selection bias can be assessed. Accurate measurement of this bias is important to make prediction...

Keywords/Search Tags:

Credit scoring, Data, Reject inference, Prediction, Sample selection bias, Model, Categorical survey questions, Sampling

PDF Full Text Request

Related items

1	Research On CPLE_LightGBM Personal Credit Scoring Model Of Online P2P Lending Based On Reject Inference
2	The Applications Of Reject Inference In Credit Scorecard Models
3	Research On Personal Credit Scoring Sample Set Optimization Based On Monte Carlo Model
4	Research On Sample Bias Correction Of Housing Loan Credit Scoring Based On Case-Based Reasoning
5	Credit Scoring Model Based On Bayesian Methods For Small Businesses
6	The Research Of Personal Credit Scoring Based On CBR System Optimization
7	Reject Inference In Credit Score
8	Research On Reject Inference In Personal Credit Evaluation
9	Reject Inference In Credit Scoring With Robust Logistic Regression
10	Study On Statistical Sampling Methods Of Natural Gas Used In Industries In Chongqing