Research On Customer Churn Prediction Of Commercial Banks Based On Mixed Feature Selection

Posted on:2023-02-07

Degree:Master

Type:Thesis

Country:China

Candidate:Y Jin

Full Text:PDF

GTID:2568306842471724

Subject:Applied Statistics

Abstract/Summary:

Under the background of increasingly changing economic situation and increasingly stringent financial regulatory policies,retail business has become an unavoidable issue for major banks.However,as the cornerstone of retail business development,individual customers have become the core of competition between banking industry and online finance,and the loss of customers in commercial banks has become more and more serious.Therefore,it is of great significance for the survival and development of banks to discover the cause factors from a large amount of customer information and establish a customer churn early warning model.Based on the current situation of China’s commercial banks and the shortcomings of the existing risk control evaluation methods,this thesis takes the customer information data of a commercial bank as an example,aims at the limitations of filter and wrapper methods,and proposes a customer churn prediction model based on mixed feature selection method.The main contents of the thesis are as follows:(1)Firstly,this thesis makes a visual analysis of category features and numerical features,and combs the important related factors of customer churn.By analyzing the feature distribution,this thesis chooses to retain outliers and take the default value as one of the feature values.Then,according to the characteristics of tree model,category features are encoded into numerical features.(2)Secondly,considering correlation and redundancy in high-dimensional features,the measurement standard is optimized under the premise of maximum correlation and minimum redundancy algorithm,and mutual information is replaced by maximal information coefficient which solves the problem that the original algorithm is inefficient in large samples and cannot accurately measure the correlation between continuous features.By gradually deleting the tail features,this thesis sets a dividing line where the prediction effect of the model decreases greatly,so as to remove the redundant and low prediction ability features in a short time.Compared with the single-index filtering algorithm,the effectiveness of the improved algorithm is improved.(3)Thirdly,recursive feature elimination algorithm based on cross-validation and Boruta algorithm are used for secondary screening of features in four integrated models(XGBoost,Light GBM,Cat Boost,and random forests).Compared with the feature importance results based on the original tree model,the above two algorithms reduce the mutual influence of coupling features and avoid the risk of overestimation of random features.Under the premise of not reducing the prediction effect of the model as much as possible,this thesis selects different optimal feature subsets of different models.The original 625 features are reduced to 14-89,which not only ensures the effect of feature selection but also improves the efficiency of model training.(4)Based on the above results,this thesis obtains the optimal hyper-parameter of each model through Bayesian optimization.In order to combine the advantages of each model and improve the prediction performance and robustness,on the basis of feature difference,model difference and parameter difference,this thesis uses stacking framework with 5-fold cross-validation to construct a customer churn prediction model based on differentiated feature set.Compared with the single model and fusion model by voting,Stacking has the highest prediction accuracy,which can predict the lost users better.Finally,the future research is prospected from the aspects of equalization processing,data timeliness and model integration scale.

Keywords/Search Tags:

customer churn, feature selection, ensemble learning, model fusion

Related items

1	Customer Churn Prediction Based On Xgboost And Logistics Hybrid Algorithm
2	Prediction And Analysis Of Telecom Customer Churn Warning Model Based On Machine Learning
3	Aviation Customer Value Assessment And Churn Prediction Model Based On Data Mining Analysis
4	Telecom Customer Churn Prediction And Application Based On Ensemble Learning Fusion Mode
5	Broadcasting Customer Loss Based On Double-layer Feature Selection Predictive Model Research
6	Research On Customer Churn Prediction Based On Ensembled Deep Learning
7	The Application Of Ensemble Learning In The Early Warning Model Of Operator User Churn
8	Design And Implementation Of Telecom Customer Churn Prediction Model Based On Particle Swarm Optimization Algorithm
9	Feature Selection On Customer-churn Model In Broker
10	Research On Telecom Customer Churn Based On Machine Learning