Font Size: a A A

Research On Credit Evaluation Based On Improved Oversampling Method And Adaptive Ensemble Model

Posted on:2022-03-15Degree:MasterType:Thesis
Country:ChinaCandidate:Q L YuFull Text:PDF
GTID:2518306542951229Subject:Master of Applied Statistics
Abstract/Summary:PDF Full Text Request
With the application of big data technology in the financial industry,various innovative models of Internet financial products have been widely promoted,and there are also a lot of potential risks.Credit evaluation as an important part of Internet finance,its role is self-evident.In this context,how to use big data to efficiently and accurately identify default borrowers is the main way for credit risk evaluation to eliminate potential risks.In reality,the number of default borrowers is always far less than the number of normal borrowers under normal circumstances.Therefore,the data of credit risk is mostly unbalanced data,and the data needs to be balanced.Aiming at the shortcomings of SMOTE(Synthetic Minority Over-Sampling)and other oversampling methods that generate the same amount of new samples for each minority class and generate boundary noise samples,this paper presents an improved SMOTE oversampling.Firstly,the location of each minority sample is used to determine the quality of the sample;Then the number of new samples generated is calculated according to the quality of the sample;At last,by generating new examples along the line between the minority examples and their center,the position of the new samples is adjusted to move the position of the new sample to the center,avoiding new samples of the minority class in the fuzzy area of classification.Experiments on UCI datasets show that this paper proposed an improved SMOTE can effectively improve the quality of the synthetic samples and the accuracy of classification.Ensemble model is a kind of machine learning method,which is widely used in credit evaluation due to its accuracy and efficiency.The Stacking(Stacked Generalization)model is a high-performance ensemble model that performs well in credit risk evaluation.In order to solve the problem of Stacking model prone to overfitting,while ensuring the accuracy of the model to the greatest extent,this paper proposes to select the base model adaptively for ensemble model according to the JC(Jaccard and Cosine Similarity)index before the model training.The selected base model must not only ensure the accuracy,but also have certain differences.This paper uses the Lending Club dataset as the empirical analysis data.In terms of model verification,two types of experiments were carried out.The first category is the comparison between SMOTE oversampling and improved SMOTE oversampling under the Stacking model.The results show that the improved SMOTE oversampling method generates minority samples with higher quality.The second type of experiment is the Stacking model built for different base classifiers.The results show that the Stacking model composed of base classifiers selected by the JC index has better performance.
Keywords/Search Tags:Imbalanced data, Credit risk, Oversampling, Stacking Model, Lending Club
PDF Full Text Request
Related items