Font Size: a A A

Research On Personal Credit Risk Assessment Based On Stacking Selective Integration Algorithm

Posted on:2021-02-26Degree:MasterType:Thesis
Country:ChinaCandidate:T LuoFull Text:PDF
GTID:2370330623465688Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
With the rapid development of China's credit economy,credit consumption,personal unsecured loans and other businesses account for an increasing proportion in various financial institutions,and the application of credit consumption methods in China's economy and people's daily life has also been extremely Big improvements and extensions.How to balance the scale of consumer credit business with the default ratio of personal credit is the main technical issue in academic research at present,and it is also a strategic technical issue to be solved in the development of various financial institutions such as commercial banks in China.These technical problems are,in the final analysis,how to solve the problem of personal credit rationing risk of financial institutions.One of the main technical difficulties in the allocation of personal credit is how to select personal lenders scientifically and accurately,and personal credit risk assessment can just solve this problem.It is of great significance to quantify credit risk by studying the basic attributes of individuals and related information about borrowing and borrowing using machine learning methods to solve the problem of personal credit rationing.This article first selects the data of Give Me Some Credit of Kaggle competition platform,and conducts a lot of preprocessing and descriptive analysis of its data.In the fourth chapter of this paper,the mode is used to fill in missing values based on the distribution of missing features during preprocessing,and extreme outliers in the features are eliminated based on the discrimination results of the boxplot method.Correlation of features is also drawn.The coefficient heat map found that there was collinearity among the three characteristics of overdue pen counts.By retaining important characteristics and taking the ratio of the other two characteristics,the influence of collinearity was eliminated.During the descriptive analysis,it was found that most of the features have long tails,so they were logarithmic transformed,and the sparse values in the features were also properly counted and binned.Secondly,the preprocessed data were screened for features.The correlation and importance between the default marker and each feature were calculated using the filtering method and the embedded method,and the two were weighted and averaged to calculate a comprehensive score.The line chart screens the first 19 characteristics,establishes an index system for personal credit risk assessment,and uses the SMOTE algorithm to balance the sample data with category bias.Finally,a single personal credit risk assessment model was constructed using machine learning algorithms such as Logistic Regression,RandomForest,ANN,AdaBoost,and XGBoost.According to the evaluation accuracy,algorithm limitations,and adaptability of each single model,five relatively good ones were selected.Single model.These five models were integrated using the Maximum Voting method;using the comprehensive score of the five single model evaluation indicators,different weights were assigned to each,and Weighted Averaging integration was performed;stacking The integrated algorithm uses the selected 5 single models as the base classifier,and the meta-model selects a logistic regression algorithm to train the output of the base classifier.It also uses the Pipeline function in the Scikit-learn library to add a layer of workflow pipeline to the output of the selective integration model,and standardizes and corrects the output result.From the comparative analysis between selective ensemble models and a single model,it is found that Stacking's selective ensemble algorithm has a good performance in assessing whether the borrower is in default,in terms of evaluation accuracy,robustness and adaptability,among which the evaluation is reduced.The indicator Logloss value is particularly obvious.Therefore,it can be concluded that Stacking's selective ensemble algorithm integrates the characteristics of each classification algorithm to achieve the purpose of collecting talents and has a great application value in solving personal credit risk assessment problems.
Keywords/Search Tags:Personal credit risk, credit rationing, class imbalance, stacking, selective ensemble algorithm
PDF Full Text Request
Related items