Unbalanced Data Classification In Credit Risk Assessment

Posted on:2022-10-23

Degree:Master

Type:Thesis

Country:China

Candidate:T Huang

Full Text:PDF

GTID:2518306347457664

Subject:Automation Technology

Abstract/Summary:

Credit default data is a kind of unbalanced data set,in which the number of default users is significantly less than that of normal users,and the loss of judging default users as normal users is often higher.For the classification of this kind of unbalanced data set,the existing research mainly starts from three aspects:data resampling,feature extraction and classification algorithm transformation,This paper studies how to improve the classification accuracy of the classifier for unbalanced data sets:1)in the aspect of data resampling,In this paper,a new sample generation algorithm,variational self encoder VAE,is proposed,which is a deep learning algorithm.By learning the distribution information of the samples in the training set,new sample points are generated.Numerical experiments show that this algorithm can significantly improve the classification accuracy of the model compared with the smote algorithm with linear interpolation.2)from the aspect of classification algorithm,this paper starts with the cost sensitive loss function,This paper studies the cost sensitive transformation criterion of the loss function satisfying the Bayesian optimal decision condition,and proposes the exponential loss AdaBoost algorithm with the classification cost in the loss function.Through the experimental comparison with AdaBoost algorithm and asyb out algorithm,it is concluded that asyb in algorithm can improve the performance of the classifier to a certain extent...

Keywords/Search Tags:

Unbalanced data set, Variational encoder, Resampling, Cost sensitive learning, loss function, boosting

Related items

1	Research On Unbalanced Data Mining Algorithm Based On Cost-sensitive Learning
2	Studies On Cost-sensitive Regression Learning Of Small Dataset
3	D-MetaCost:An Efficient Multi-class Cost-sensitive Algorithm
4	Research On Cost-sensitive Industrial Anomaly Data Detection Method Based On F-value Optimization
5	Cost-sensitive boosting for classification of imbalanced data
6	Research On Imbalanced Data Classification Algorithms Based On Weight Analysis Of Loss Function
7	Research On Classification And Application Of Unbalanced Data Based On Resampling And Ensemble Learning
8	Research Of Ensemble Classification Methods For Class-imbalance And Cost-sensitive Datasets
9	Research On Federatedlearning Methods For Unbalanced Data
10	Research Of Boosting Classificaion Algorithm For Imbalanced Data