Font Size: a A A

Research On Bayesian Network Based Missing Value Imputation Model For Incomplete Credit Data

Posted on:2020-11-20Degree:MasterType:Thesis
Country:ChinaCandidate:X Q XuFull Text:PDF
GTID:2428330620451272Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
With the deregulation of the private capital,all kinds of online and offline lending institutions are developing rapidly.In that case,the credit market scale is increasing day by day,which drives the stable growth of the credit service market scale and stimulates the development requirement for credit scoring models.The performance of credit scoring model is highly dependent on the quality of raw dataset.However,missing values are ubiquitous in many real world applications,which significantly reduces the accuracy and usability of the credit scoring model.In practice,most of the models address this problem by deleting the missing instances from the dataset or imputing missing values with mean,mode or regression values.But these methods often result in a significant loss of information or bias.Aiming at addressing this challenge of missing data in the context of credit scoring,a Bayesian Network based iterative imputation model called BNII is proposed in this paper.The proposed BNII model includes two stages: the preparatory stage and the imputation stage.In the first stage,a Bayesian network with all attributes in original dataset is constructed from the complete dataset.In this way,both the network structure that implies the dependencies between variables and the parameters at each variable's conditional distributions are learned in this stage.In the second stage,similar to EM algorithm,the variables with missing values are iteratively imputed using Bayesian Network models learned in the first stage.It is proved that the algorithm is monotonically convergent.Compared with other classical methods of missing data imputation methods,the advantages of our model are as follows.It exploits the inherent probability-dependent relationship between variables,but without a specific probability distribution hypothesis,and it is suitable for both single variable missing data and multivariable missing data.Three data sets are used for experiments: one is the real data set from Renrendai,a famous P2 P financial company,and the other two are the benchmark data sets(German and Australia)provided by UCI.Experimental results show that the proposed model has better imputation accuracy and is beneficial to the performance of credit scoring model,when compared with other well-known missing value imputation techniques.This indicates that the proposed method has better capability to solve the problem of multivariable data missing in credit scoring.
Keywords/Search Tags:Credit scoring, Data preprocessing, Bayesian network, Missing data imputation
PDF Full Text Request
Related items