Font Size: a A A

Research On Personal Credit Evaluation Methods Under Imbalanced Data

Posted on:2022-06-05Degree:MasterType:Thesis
Country:ChinaCandidate:Z L LiuFull Text:PDF
GTID:2518306575963649Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the continuous development of Internet finance,people's loan demands are also changing gradually,and credit evaluation is playing an increasingly important role in the financial industry.No matter Alipay,Jingdong Baitiao or P2 P online loan,before providing convenient services to users,they all need to establish personal credit evaluation models based on users' basic information and historical behavior data to predict possible default problems.Applying machine learning model to personal credit evaluation is a common method to solve this problem.Currently,there are two problems with credit evaluation mainly,the first is data imbalance caused by bad credit samples,it is difficult to identify most of the credit data set is a good sample size is far more than the number of samples of bad credit problems,makes the model samples of bad credit recognition rate is low,how to effectively improve the detection ability of bad credit sample has important research value;Secondly,most classification models only consider the influence of high-order feature combinations on the final decision results,and ignore the classification gain brought by low-order feature combinations.At the same time,credit evaluation data sets are often accompanied by high feature dimension,data absence and sparse feature problems.In view of these problems,in order to improve the recognition rate of the model for users with bad credit and the accuracy of the model,the following three aspects are studied in this paper:1.Analyzed the credit evaluation data,took a series of preprocessing(feature missing value processing,data dimensionality reduction and feature standardization,etc.),and visualized the data distribution.2.To solve the problem of data imbalance,a new method to deal with the imbalance is designed according to the sample characteristics of credit evaluation by using the synthetic minority over-sampling technology.Firstly,the clustering method based on kmeans is used to divide different types of samples,and then the ratio of good credit and bad credit in each cluster is judged sequentially.For the cluster whose ratio exceeds the threshold,SMOTE method is used to generate data.3.To solve the problem that the credit assessment classification model only considers the combination of high-order features,FM model is firstly combined with GBDT2 NN model,in which GBDT2 NN model is used to process the information gain brought by the combination of high-order features and FM model is used to process the information gain brought by the combination of low-order features.Then logistic regression algorithm is used to combine the calculation results of the two models to get the final classification results.In the end,AUC and recall rate were used as evaluation indexes,and multiple model classification models were selected for verification,including GBDT2 NN,LR and Light TGBM.The experimental results proved the effectiveness of FM-GBDT2 NN method in the field of credit evaluation.
Keywords/Search Tags:credit evaluation, machine learning, data imbalance, gbdt
PDF Full Text Request
Related items