Font Size: a A A

Unbalanced Data Classification In Credit Risk Assessment

Posted on:2022-10-23Degree:MasterType:Thesis
Country:ChinaCandidate:T HuangFull Text:PDF
GTID:2518306347457664Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
Credit default data is a kind of unbalanced data set,in which the number of default users is significantly less than that of normal users,and the loss of judging default users as normal users is often higher.For the classification of this kind of unbalanced data set,the existing research mainly starts from three aspects:data resampling,feature extraction and classification algorithm transformation,This paper studies how to improve the classification accuracy of the classifier for unbalanced data sets:1)in the aspect of data resampling,In this paper,a new sample generation algorithm,variational self encoder VAE,is proposed,which is a deep learning algorithm.By learning the distribution information of the samples in the training set,new sample points are generated.Numerical experiments show that this algorithm can significantly improve the classification accuracy of the model compared with the smote algorithm with linear interpolation.2)from the aspect of classification algorithm,this paper starts with the cost sensitive loss function,This paper studies the cost sensitive transformation criterion of the loss function satisfying the Bayesian optimal decision condition,and proposes the exponential loss AdaBoost algorithm with the classification cost in the loss function.Through the experimental comparison with AdaBoost algorithm and asyb out algorithm,it is concluded that asyb in algorithm can improve the performance of the classifier to a certain extent...
Keywords/Search Tags:Unbalanced data set, Variational encoder, Resampling, Cost sensitive learning, loss function, boosting
PDF Full Text Request
Related items