Font Size: a A A

Research On Credit Forecasting Of Hybrid Model Based On Imbalanced Data

Posted on:2023-06-05Degree:MasterType:Thesis
Country:ChinaCandidate:J W YuFull Text:PDF
GTID:2569306800960339Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The rapid development of Internet finance brings great convenience to people’s lives,but also exposes huge credit risks.How to predict the fraudulent behavior of credit card applicants in the financial field has become a major problem that financial institutions need to solve today.It is against this background that financial risk prevention and control came into being.Financial institutions can establish risk assessment models for applicants,and use applicants’ personal information and social activity tracks to discover their potential risks,thereby reducing their own losses.However,credit prediction is a typical binary classification problem with unbalanced data.The data has the characteristics of unbalanced categories and high feature dimensions.The current traditional machine learning methods cannot deal with the problem of unbalanced data.Therefore,this paper focuses on the problem of data imbalance,improves from the data level and the algorithm level,and builds a credit prediction model by combining the data balance method with the Stacking fusion model.The main research contents of this paper are as follows:(1)Predictive features for building a credit prediction model.A good prediction feature is one of the important steps for an algorithm to obtain excellent prediction results.This paper firstly uses data mining technology to perform data preprocessing on the data in the credit prediction model,and then uses statistical knowledge to extract features from the preprocessed data.Finally,the irrelevant features are removed by sorting the importance of the features using the random forest algorithm.(2)At the data level,an improved SMOTE-ENN data balance method is proposed.In order to solve the defect of marginal distribution caused by the SMOTE algorithm in dealing with unbalanced data sets,this paper improves the SMOTE algorithm,introduces the Borderline SMOTE algorithm and the KNN algorithm for effective combination,and forms a SMOTE-ENN resampling method to deal with The problem of imbalanced datasets.This paper has used the SMOTE-ENN resampling data balance method to perform a lateral comparison with the existing sampling methods,which proves that the SMOTE-ENN resampling method has better results.(3)At the algorithm level,a credit prediction model based on the Stacking model fusion of multi-heterogeneous algorithms is proposed.This paper mainly selects nine machine learning classification algorithms,KNN,Support Vector Machine,Ada Boost,Random Forest,XGBoost,Naive Bayes,Catboost,Decision Tree,Light GBM,and uses the grid search method to perform hyperparameters for the nine classification models.After the tuning,three algorithms with better classification effect are selected from the best ones,namely the random forest algorithm,the XGBoost algorithm and the Light GBM algorithm.The three algorithms are integrated with the logistic regression algorithm through the Stacking ensemble learning algorithm to form the Stacking fusion model,and then the Stacking fusion model is combined with the SMOTE-ENN data balance method to build a credit prediction model.Finally,this paper uses the credit prediction model to compare with other single machine learning algorithms to verify the validity of the credit prediction model.The experimental results show that the F1 value of the credit prediction model is higher than that of other machine learning algorithms,so the credit prediction model is compared with other algorithms.its generalization ability is higher,and the identification of fraudulent users is more accurate.
Keywords/Search Tags:Fraud, Credit prediction model, Imbalanced data, Resampling, Ensemble learning algorithm
PDF Full Text Request
Related items