Font Size: a A A

Research On Customer Churn Prediction Of Commercial Banks Based On Mixed Feature Selection

Posted on:2023-02-07Degree:MasterType:Thesis
Country:ChinaCandidate:Y JinFull Text:PDF
GTID:2568306842471724Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
Under the background of increasingly changing economic situation and increasingly stringent financial regulatory policies,retail business has become an unavoidable issue for major banks.However,as the cornerstone of retail business development,individual customers have become the core of competition between banking industry and online finance,and the loss of customers in commercial banks has become more and more serious.Therefore,it is of great significance for the survival and development of banks to discover the cause factors from a large amount of customer information and establish a customer churn early warning model.Based on the current situation of China’s commercial banks and the shortcomings of the existing risk control evaluation methods,this thesis takes the customer information data of a commercial bank as an example,aims at the limitations of filter and wrapper methods,and proposes a customer churn prediction model based on mixed feature selection method.The main contents of the thesis are as follows:(1)Firstly,this thesis makes a visual analysis of category features and numerical features,and combs the important related factors of customer churn.By analyzing the feature distribution,this thesis chooses to retain outliers and take the default value as one of the feature values.Then,according to the characteristics of tree model,category features are encoded into numerical features.(2)Secondly,considering correlation and redundancy in high-dimensional features,the measurement standard is optimized under the premise of maximum correlation and minimum redundancy algorithm,and mutual information is replaced by maximal information coefficient which solves the problem that the original algorithm is inefficient in large samples and cannot accurately measure the correlation between continuous features.By gradually deleting the tail features,this thesis sets a dividing line where the prediction effect of the model decreases greatly,so as to remove the redundant and low prediction ability features in a short time.Compared with the single-index filtering algorithm,the effectiveness of the improved algorithm is improved.(3)Thirdly,recursive feature elimination algorithm based on cross-validation and Boruta algorithm are used for secondary screening of features in four integrated models(XGBoost,Light GBM,Cat Boost,and random forests).Compared with the feature importance results based on the original tree model,the above two algorithms reduce the mutual influence of coupling features and avoid the risk of overestimation of random features.Under the premise of not reducing the prediction effect of the model as much as possible,this thesis selects different optimal feature subsets of different models.The original 625 features are reduced to 14-89,which not only ensures the effect of feature selection but also improves the efficiency of model training.(4)Based on the above results,this thesis obtains the optimal hyper-parameter of each model through Bayesian optimization.In order to combine the advantages of each model and improve the prediction performance and robustness,on the basis of feature difference,model difference and parameter difference,this thesis uses stacking framework with 5-fold cross-validation to construct a customer churn prediction model based on differentiated feature set.Compared with the single model and fusion model by voting,Stacking has the highest prediction accuracy,which can predict the lost users better.Finally,the future research is prospected from the aspects of equalization processing,data timeliness and model integration scale.
Keywords/Search Tags:customer churn, feature selection, ensemble learning, model fusion
PDF Full Text Request
Related items