Font Size: a A A

Research On The Prediction Of Personal Loan Default Based On Hybrid Feature Selection And Heterogeneous Ensemble

Posted on:2021-04-28Degree:MasterType:Thesis
Country:ChinaCandidate:M Q SuiFull Text:PDF
GTID:2428330611455260Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the continuous development of internet finance,many personal loan companies have suffered bankruptcy due to insufficient risk control capabilities.As an effective risk control method,the borrower's risk analysis model can use the borrower's personal information and social activity data to discover the user's default risk.This thesis addresses the issues of imbalanced data types of borrowers,high feature dimensions,and high costs for enterprises to access data.It conducts feature selection and balanced processing on credit data.In addition,it builds an integrated learning model of borrower's default probability.The contributions of the thesis are as follows:(1)This thesis proposes a hierarchical framework for feature selection,including coarse-grained feature selection and fine-geained feature selection.In the coarse-grained feature selection,the Relief_S algorithm is used to give more attention to the minority samples,and select the features with strong discrimination ability for the minority samples.In this way,it can improve the prediction effectiveness of the model on imbalanced data.In the coarse-grained feature selection,we also use the Pearson method to select features.Then,the residual features are filtered by fine filtering algorithm.Compared with the direct use of fine feature selection algorithm,hierarchical feature selection significantly improves the screening efficiency under the premise of ensuring the effect of the model.(2)In the fine-grained feature selection,we propose specific feature selection methods for different prediction models based on the characteristics of prediction models.Due to the problem that the effect of logistic regression model is greatly affected by credit features,the feature selection algorithm of IKP_Lasso is proposed.IKP_Lasso can evaluate the credit characteristics from multiple angles to avoid the problem that a single evaluation indicator causes the credit information to be ignored.The LightGBM-RRFE algorithm is to filter the feature subset for LightGBM model.It can overcome the problem of instability of redundant features' ranks.In the neural network model,we also use LightGBM-RRFE as the feature selection method and combine empirical judgment.The features with strong interpretability are selected to improve the reliability.(3)Based on practical work experience,this thesis summarizes the processing methods of outliers and missing values and characteristic engineering methods for credit characteristics suitable for small-loan scenarios.Aiming at the characteristics of the Logistic Regression model that is heavily influenced by characteristics,chi-square classification and WOE transformation are carried out for the continuous feature to increase the stability and interpretability of the feature,and nonlinear information is introduced.Finally,this thesis uses different credit feature subsets to establishe default prediction models based on Logistic Regression model,LightGBM model and Neural Network model respectively,and then carries out hierarchical fusion of the three models.Compared with other credit evaluation models,the default prediction model based on hybrid feature extraction has the best prediction effect.This thesis verifies the universality and superiority of lightGBM-RRFE algorithm on several real data sets.
Keywords/Search Tags:internet finance, risk control, imbalanced data processing, feature selection, model fusion
PDF Full Text Request
Related items