Font Size: a A A

Noise-data-characteristic-driven Credit Risk Classification

Posted on:2021-03-01Degree:MasterType:Thesis
Country:ChinaCandidate:X W HuangFull Text:PDF
GTID:2428330605471772Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
In recent years,the progress of artificial intelligence technology and the rapid development of Internet finance have brought about new opportunities and challenges to credit risk assessment.Massive user data not only provides a data basis for credit risk classification using artificial intelligence technology,but also brings about a lot of noise,which will inevitably have a negative impact on the data processing and results of credit risk assessment.At present,the traditional processing of credit data noise seldom considers the influence of data characteristics.The impact of data characteristics on noise processing reflects in two aspects:one is the impact of data noise characteristics(including the type and quantity of noise)on noise processing;the other is the impact of other data characteristics on noise processing.The neglect of data noise characteristics will make it difficult to propose specific cleaning schemes for different noise situations.At the same time,other types of data characteristics will also have an impact on noise processing.For example,it is difficult to accurately determine the impact of data noise on classification results without considering the imbalance of data.Therefore,the noise cleaning method without considering the influence of data characteristics,makes the procedure of credit data noise processing lack of pertinence and affects the generalization ability of the cleaning method,so as to damage the data cleaning effect.As a result,it brings difficulties to the subsequent credit risk classification and even reduces the credibility of the classification results,and then brings losses to financial enterprises such as banks.Under this background,this paper studies credit risk classification driven by the characteristics of noisy data,and investigates the influence of attribute noise,class noise and mixed noise on credit risk classification respectively.Meantime,this paper designs noise processing models which are suitable for different noise characteristics to improve credit classification results.Specifically,main work and conclusions are shown as follows.Firstly,for the attribute noise problem in credit data,a three-stage learning model based on secondary voting is proposed.The model includes three stages:in the first stage,four indexes are introduced to evaluate the noise level of attributes.In the second stage,according to the voting results of noise level,the attributes of different noise levels are divided into different attribute sets.In the third stage,different learning strategies and noise reduction methods are used to deal with the credit data sets with different attribute sets.In this model,the classification and regression tree(CART)model is used as the final classifier to evaluate the performance of the training data set generated by different learning strategies and noise reduction methods.In addition,this part also discusses the performance of all learning strategies on sparse data set with attribute noise.The experimental results show that the proposed learning model is superior to the benchmark model in accuracy,stability and calculation time of credit classification results.Further research shows that for specific noise reduction methods,sparse attribute noise data can improve the stability of classification accuracy.The innovation of this model lies in that the proposed the secondary voting mechanism overcomes the instability of the evaluation results of attribute noise level for a single indicator.At the same time,this paper proposes a strategy of classifying the attributes with different noise levels,which can reduce the attribute noise level and retain the valuable information in the credit data to the greatest extent.The empirical results show that the proposed secondary voting based three-stage learning model is an efficient and reliable method to solve the attribute noise problem in credit risk classification.Secondly,in order to solve the class noise problem in credit data,a learning model based on clustering and prediction is proposed.The model consists of two stages:in the first stage,K-means clustering algorithm is used to process the data with different noise levels.According to the Euclidean distance between the sample and the data set center,K-means algorithm is used to identify some noise samples and correct their classes.In the second stage,the prediction based noise reduction method is used to further reduce the noise level of class,and the CART model is selected as the classifier to detect and correct the wrong classified samples according to the clustering results.The experimental results show that the learning model can effectively detect and correct class noise in credit data and improve the credit risk classification results.The experimental results indicate that the two-stage learning model based on clustering and prediction proposed in this paper is an effective tool to deal with class noise problem in credit risk classification.Finally,a comprehensive solution is proposed for the mixed noise problem in credit data.Based on the Chapters 2 and 3,this paper investigates the influence of noise processing steps on each other under different noise levels,and compares the severity of these effects through the results of noise cleaning.According to the contents of the second and third chapters,the processing of attribute noise includes the evaluation of attribute noise level and attribute classification.The processing of class noise includes the cleaning strategy based on clustering and classification prediction.This paper analyzes the mutual influence of each step by controlling the sequence of these cleaning steps.It has been found that when class noise level is low,the existence of class noise has little influence on attribute noise level evaluation,so the priority can be given to the reduction of attribute noise to improve the cleaning effect of class noise.When the class noise level is high,the priority is given to class noise processing to reduce its impact on the evaluation of attribute noise level.Based on the degree of these effects,this paper sets different priorities for each noise processing step in different noise situations,and then proposes corresponding noise cleaning schemes for different mixed noise situations,which can effectively reduce the level of mixed noise in credit data,thus providing a data basis for subsequent credit risk classification.To sum up,this paper mainly studies the problem of credit risk classification driven by noise data characteristics.Based on the type and quantity characteristics of credit data noise,this paper investigates and discusses the influence of attribute noise,class noise and mixed noise on credit risk classification in the background of big data,and puts forward the corresponding noise processing schemes.These schemes are tested by several real-world credit data and empirical results demonstrate the proposed schemes can effectively reduce the noise level of credit data and improve the classification results.Therefore,these results of this study have strong theoretical significance and practical application value.
Keywords/Search Tags:Credit risk classification, data characteristics, attribute noise, class noise, mixed noise
PDF Full Text Request
Related items