Research On Individual Credit Risk Assessment For Imbalanced Data

Posted on:2021-03-31

Degree:Master

Type:Thesis

Country:China

Candidate:Q Zhou

Full Text:PDF

GTID:2428330623976448

Subject:Engineering

Abstract/Summary:

Internet technology has given rise to a large number of emerging industries and promoted the vigorous development of Internet finance.Whether it is the JD Credit Pay,the Ant Credit Pay or P2 P lending,more and more consumer credit products have entered people's lives.Before providing users with convenient and reliable services,many Internet credit products need to build a personal credit risk assessment model based on users' basic information and historical transaction data to predict possible default risks.Building a personal credit risk assessment model using machine learning algorithm is a common method to solve this practical problem.Credit data are usually unbalanced in categories.When traditional machine learning classification algorithm processes unbalanced data,it usually results in a small number of class samples being mistakenly assigned to a large number of classes,resulting in unsatisfactory prediction results.However,in practical problems,it is more important to correctly identify a few class samples.Therefore,how to classify unbalanced data effectively is of great research value.At the same time,credit data also has the characteristics of high dimension and many redundant features.How to make effective feature selection on the data,so that the selected feature subset can maximize the model generalization ability and save model training time while containing the most data information and the least noise features?Based on this background,this paper proposes an improved data resampling method and feature selection method to improve the recognition rate of a small number of samples in unbalanced credit data,which is used to process high-dimensional unbalanced credit data,and establishes an individual credit risk assessment model through gcForest.The specific research contents are as follows:(1)To balance the data by oversampling.An improved ADASYN data oversamplication method based on HVDM distance measurement is proposed to improve the efficiency and rationality of generating new samples in the process of oversamplication.(2)A feature selection algorithm based on the idea of minimum redundancy-maximum correlation is proposed.The AUC value of single feature is used as the measure standard of feature importance,and the feature subset with high information and few redundant features is selected by calculating Kendell correlation coefficient among features.(3)Based on unbalanced credit data at home and abroad,the deep forest algorithm gcForest was used to construct the individual credit risk assessment model.By improving the cascade structure in the deep forest,combining with XGBoost algorithm to enrich the original base classifier categories in the cascade layer,and further strengthening the ability of the whole forest to identify a small number of samples,the individual credit risk assessment model for unbalanced credit data is finally constructed.

Keywords/Search Tags:

Personal credit assessment, Imbalanced classification, feature selection, gcForest

Related items

1	Research On GA-based Subspace Classification And Its Application To Personal Credit Evaluation
2	Research On Credit Assessment Of Micro And Small Enterprises For Unbalanced Data
3	Research And Application Of Imbalanced Classification Technology Based On GcForest
4	Personal Credit Scoring Based On Imbalanced Ensemble Classification
5	Neural Network-based Personal Credit Risk Assessment Model
6	Research On Personal Credit Evaluation Based On Credit Platform Data
7	The Comparative Research Of Personal Credit Assessment Model Based On BP Neural Network And SVM
8	Research On Risk Assessment Model Of Personal Credit Application Based On Co-forest Algorithm
9	Adaptive KNN Classification Algorithm And Its Application In Personal Credit Risk Assessment
10	Feature Selection And Classification For Imbalanced Medical Data