| With the rapid development of the information industry and fierce competition in the telecommunications sector,the phenomenon of user churn among telecommunications operators is becoming increasingly serious,against the background of "number portability".Due to the limited space for new customers in the telecommunications market and the high cost of attracting new customers,predicting customer churn and taking corresponding measures to retain existing customers has become an important part of business analysis for telecommunications operators.However,traditional customer churn prediction models have not taken into account the value of customers,leading to poor predictive performance among some loyal customer groups and inability to accurately grasp the reasons for customer churn in different value groups.Therefore,this article analyzes the basic information and behavioral characteristics of telecommunications business customers,divides them into different groups based on customer value,and then establishes customer churn prediction models for each group.The best-performing models are combined to improve the ability to predict customer churn among different customer value groups.This study is of great significance for telecommunications operators to understand the behavior patterns,consumption habits,service needs,and other characteristics of different customer groups,so as to manage customers in a refined manner,improve customer maintenance levels,promote the company’s competitive advantage,and achieve sustainable development.The specific work of this article is as follows:Firstly,through visual analysis and preprocessing of the data,the impact of each variable on customer churn is initially understood,laying the foundation for subsequent modeling.Secondly,based on the RFM model,a customer value indicator is constructed,and the particle swarm optimization K-means clustering algorithm is used to determine the optimal number of customer clusters through elbow method and silhouette coefficient,achieving customer segmentation and characterizing each customer segment.Thirdly,based on the customer segmentation results,logistic regression,random forest,XGBoost,SVM,and Cat Boost classification algorithms are used to build customer churn prediction models for each customer segment.Finally,the predictive performance of the models before and after customer segmentation is compared,and the SHAP theory is used to explore the interpretability of the models,specifically analyzing the impact of features on individual customer sample and overall customer sample churn.This study effectively improved the predictive performance and application ability of the model.The combination of sub-models showed a significant improvement in identifying churn customers in all samples and various value groups.Specifically,(1)on the overall sample data,with a guaranteed prediction accuracy of over 75%,the recall rate of churn category in the segmented combined prediction result was 79%,which was about 22% higher than the optimal model of the customer clustering with a recall rate of 57%.(2)The performance of the segmented combined model was also superior to that established on the overall data in various value groups,especially for the general value customer group and high-consumption loyal customer group,with recall rates of 94% and 76%,respectively,after segmentation and combination,compared to the recall rates of 20% and 11%,respectively,before segmentation and combination. |