Font Size: a A A

Research On Credit Risk Classification Based On Data Characteristics

Posted on:2020-12-21Degree:MasterType:Thesis
Country:ChinaCandidate:R T ZhouFull Text:PDF
GTID:2438330602961893Subject:Business Administration
Abstract/Summary:PDF Full Text Request
Nowadays,credit risk has become increasing important in the field of financial risk management.In the present time of information explosion,more and more factors are included in the credit evaluation system,followed by many problems of some certain data characteristics becoming more and more serious in the process of credit risk assessment.Among them,imbalance,data-missing and sparseness are common problems in data characteristics,which have a great impact on the result of credit risk classification.At present,most of the traditional modeling methods ignore the data feature problem,and decide which model is more suitable merely by trial and experience.However,the existence of data feature problems tends to have a great impact on the model results.For example,in highly imbalanced data,traditional algorithms tend to ignore the small class,and thus classify all samples into the big class;Some algorithms do not support the processing of data with high missing problems;also,in the case of high sparsity,the effect of many classification algorithms will be greatly reduced or even invalid.Therefore,this method of blindly applying the model without considering the influence of data characteristics makes the credit risk classification process more time-consuming and inefficient.It is more difficult to find a suitable model,which not only reduces the credibility of credit classification results,but also increases the risk of economic loss,which further affects the efficiency of capital operations,and hinders social and economic development seriously.To solve this problem,this paper carries out a research of data feature-driven credit risk classification,which focuses on the data imbalance,missing and sparsity problems in credit risk classification,by introducing or improving data pre-processing methods and classification algorithms.A series of models have been devised to solve these data feature problems in a more accurate,fast and reasonable manner.At the same time,we compare the proposed model with some traditional solutions to find better methods and which models are more appropriate when the severity of the data feature problem is variable.The main contents and conclusions of this paper are summarized as follows:Firstly,a re-sampling support vector machine based depth belief network(SVM-DBN)ensemble learning model is proposed for the problem of imbalanced data in credit risk classification.In this model,the resampling technique is first applied to the SVM single classifier;the results of these single classifiers are taken as input,and the final output result is obtained through the DBN-based ensemble strategy.The experimental results indicate that the combination of the resampling technique and the DBN model as an ensemble strategy can effectively improve the classification performance,especially in the highly imbalanced data problem.At the same time,the introduction of income sensitivity matrix makes the classification problem closer to the real-world situation,making the classification result more credible.The innovation of this model is that the proposed resampling SVM-DBN ensemble learning model is the first to apply DBN ensemble technology to solve the problem of imbalance and proves to perform well in highly unbalanced data problems.The empirical results further indicate that the proposed resampling SVM-DBN ensemble model can be used as an effective tool to solve the data imbalance problem in credit risk assessment.Secondly,aiming at the missing data in credit classification,a data preprocessing model based on onehot encoding is proposed.Unlike the traditional imputation methods,which try to replace the missing values as accurately as possible,the onehot encoding treats the missing values in the dataset as new categories.This process reconstructs the original missing matrix into a complete new matrix,and put it through the traditional CART decision tree to get a final classification result.Furthermore,the model also compares the advantages and disadvantages of various traditional data imputation strategies at different levels of missing rates.The empirical results show that compared with the traditional methods,the onehot encoding model shows not only high accuracy,but also outstanding robustness and efficiency.The innovation of this model is that the proposed onehot encoding model provides a new perspective to avoid the simulations interfering with data structure in solving the data-missing problem.At the same time,it also eliminates the influence of difference in digitized value on training effect of model when digitizing a classified-type feature.The preprocessing model based on onehot encoding proved to be an efficient and reliable method for dealing with data-missing problems in the case of high missing rate.Finally,based on the above-mentioned high sparsity data generated by onehot encoding,a restricted Boltzmann machine based principal component analysis(RBM-PCA)preprocessing model is constructed.In this model,the high-sparse onehot-type data,which is difficult to process,is firstly densified and dimensionally reduced by the reconstruction function of the RBM model,and then combined with PCA for further feature extraction.The experimental results show that compared with the traditional feature dimension reduction and feature extraction methods,the RBM-PCA preprocessing model can efficiently obtain higher classification accuracy in the highly sparse onehot encoding dataset.This paper first focuses on this type of high sparsity data generated by onehot encoding and the related processing method.The empirical results further prove that the RBM-PCA preprocessing model can provide reliable solution on credit classification with high sparsity data problem.In summary,this paper carries out research on data feature-driven credit risk classification,including research and discussion on the imbalance,missing data and sparseness issues in credit risk classification.The designed models then go through real-world credit data,and prove to have high accuracy and show high stability and efficiency,especially in the face of some typical data feature problems,which makes the research of this paper have strong theoretical significance and application value.
Keywords/Search Tags:credit risk classification, data characteristics, imbalance, data missing, sparsity
PDF Full Text Request
Related items