| With the popularization and deepening of the Internet,the market has become increasingly saturated,which makes it is difficult for enterprises to continue their previous experience of expanding market share by attracting new customers.On the one hand,it is more expensive to develop and attract new customers in a saturated market;on the other hand,enterprises need to find new development models to complete the transition from user growth to service quality improvement.Therefore,maintaining existing customers to prevent loss has become the core focus of enterprises.Oriented to telecom customer churn prediction scenario,this thesis firstly proposes an imbalanced data prediction method L-CCASmote based on LASSO and Constructive Covering Algorithm,and then combines with the customer segmentation to predict telecom customer churn.Experiments are performed on public datasets and desensitized telecom datasets.The main work of this thesis is as follows:1.Aiming at the problems of severe imbalance in customer data,cluttered feature space information,and collinearity between features,an imbalanced prediction method L-CCASmote based on LASSO and CCA is proposed.The method first extracts churn-related features through LASSO to optimize the model input;then builds a CCA neural network to construct coverages that conforms to the overall distribution of samples;and further proposes a hybrid sampling with a single-sample coverage strategy,a sample diversity strategy and a sample density peak strategy to balance the data.The effectiveness of LASSO is verified through comparative experiments with feature selection method based on L1,L2 and no operation firstly.And then,comparing L-CCASmote with SMOTE-Enn,SMOTE-Tomek,Borline-SMOTE,Adaptive synthetic sampling(Adasyn)and One-Sided Selection(OSS)in logistic regression and support vector machine,the results show that the L-CCASmote balanced method is more effective in improving the model’s churn recognition rate and predictive classification ability.2.For enterprises,customers have different values,and these differences are reflected in the characteristics of sample and the resource allocation of enterprise CRM decision-making.Therefore,a telecom customer churn prediction method based on customer segmentation and L-CCASmote is further propesed.This method establishes a telecom customer value evaluation system with 10-dimensional features,and uses an optimized K-means clustering algorithm and principal component analysis to classify customers into three value groups: high,medium and low,and then uses the L-CCASmote method to balance the data.Finally,prediction experiments are performed on the balanced datasets of each value class and compared with the unsegmented dataset,the result shows that the method based on customer segmentation improves the overall churn recognition rate.The work in this thesis shows that the telecom customer churn prediction method based on customer segmentation and L-CCASmote can not only effectively predict customers with a churn tendency,but also reduce the impact of different churn rules under different value customer groups on the prediction results.Thereby,the prediction accuracy can be effectively improved,which has important practical significance. |