Research On Credit Forecasting Of Hybrid Model Based On Imbalanced Data

Posted on:2023-06-05

Degree:Master

Type:Thesis

Country:China

Candidate:J W Yu

Full Text:PDF

GTID:2569306800960339

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

The rapid development of Internet finance brings great convenience to people’s lives,but also exposes huge credit risks.How to predict the fraudulent behavior of credit card applicants in the financial field has become a major problem that financial institutions need to solve today.It is against this background that financial risk prevention and control came into being.Financial institutions can establish risk assessment models for applicants,and use applicants’ personal information and social activity tracks to discover their potential risks,thereby reducing their own losses.However,credit prediction is a typical binary classification problem with unbalanced data.The data has the characteristics of unbalanced categories and high feature dimensions.The current traditional machine learning methods cannot deal with the problem of unbalanced data.Therefore,this paper focuses on the problem of data imbalance,improves from the data level and the algorithm level,and builds a credit prediction model by combining the data balance method with the Stacking fusion model.The main research contents of this paper are as follows:(1)Predictive features for building a credit prediction model.A good prediction feature is one of the important steps for an algorithm to obtain excellent prediction results.This paper firstly uses data mining technology to perform data preprocessing on the data in the credit prediction model,and then uses statistical knowledge to extract features from the preprocessed data.Finally,the irrelevant features are removed by sorting the importance of the features using the random forest algorithm.(2)At the data level,an improved SMOTE-ENN data balance method is proposed.In order to solve the defect of marginal distribution caused by the SMOTE algorithm in dealing with unbalanced data sets,this paper improves the SMOTE algorithm,introduces the Borderline SMOTE algorithm and the KNN algorithm for effective combination,and forms a SMOTE-ENN resampling method to deal with The problem of imbalanced datasets.This paper has used the SMOTE-ENN resampling data balance method to perform a lateral comparison with the existing sampling methods,which proves that the SMOTE-ENN resampling method has better results.(3)At the algorithm level,a credit prediction model based on the Stacking model fusion of multi-heterogeneous algorithms is proposed.This paper mainly selects nine machine learning classification algorithms,KNN,Support Vector Machine,Ada Boost,Random Forest,XGBoost,Naive Bayes,Catboost,Decision Tree,Light GBM,and uses the grid search method to perform hyperparameters for the nine classification models.After the tuning,three algorithms with better classification effect are selected from the best ones,namely the random forest algorithm,the XGBoost algorithm and the Light GBM algorithm.The three algorithms are integrated with the logistic regression algorithm through the Stacking ensemble learning algorithm to form the Stacking fusion model,and then the Stacking fusion model is combined with the SMOTE-ENN data balance method to build a credit prediction model.Finally,this paper uses the credit prediction model to compare with other single machine learning algorithms to verify the validity of the credit prediction model.The experimental results show that the F1 value of the credit prediction model is higher than that of other machine learning algorithms,so the credit prediction model is compared with other algorithms.its generalization ability is higher,and the identification of fraudulent users is more accurate.

Keywords/Search Tags:

Fraud, Credit prediction model, Imbalanced data, Resampling, Ensemble learning algorithm

PDF Full Text Request

Related items

1	Application Of RUSBoost Algorithm In Imbalanced Datasets
2	Interpretable Prediction Model Of Personal Credit Default Based On Ensemble Learning And SHAP Optimization
3	Research On The Data-driven Default Risk Prediction Approaches Of Consumer Finance
4	Research On Financial Crisis Prediction Model Of Listed Companies Based On Ensemble Learning Algorithm
5	The Study For Fraud Detection Of Credit Card Based On Imbalanced Data
6	Credit Card Fraud Detection Based On Ensemble Learning
7	Credit Risk Prediction Of Telecom Users Based On Improved Stacking Algorithm
8	Research On Prediction Of Return On Assets Based On Ensemble Learning
9	An Ensemble Learning Approach To Personal Credit Risk Assessment
10	Research On Multi-class Imbalanced Corporate Bond Default Risk Prediction Based On The Adaboost Ensemble Model