| With the development of social economy,the competition in telecommunications has become more and more fierce,and customer churn has become a growing concern for each enterprise.Studies have shown that the cost of retaining an old customer is much lower than that of developing a new one,so it is urgent to solve the churn problem.By identifying customers with churning intention and analyzing the reasons for churning through data mining methods,operators can take timely retention measures to avoid huge losses caused by customer churn.In this paper,we first perform a preliminary exploration of the data to explore the inherent law of the data through visualization.Secondly,the data are pre-processed,including multiple aggregating tables,removing irrelevant features,feature coding,data normalization,and SMOTE data balancing.Finally,this paper model the data from following three aspects: single model mainly uses logistic regression,decision tree,SVM,and KNN models;ensemble model mainly uses random forest,XGBoost,Ada Boost,Light GBM,and CatBoost models;combined model,respectively uses Hard Voting,Soft Voting,and Stacking way to combine SVM,Light GBM,CatBoost three models.All the above models use grid search and plotting learning curve to adjust the parameters to make the model effect improved,and F1-Score is used as the primary evaluation criterion,and Recall is used as a reference to evaluate the models.The results show that the SVM model performs best among the single models,the CatBoost model performs best among the ensemble models,and the Hard Voting combination approach performs best among the combined models.In addition,we found that the performance of the ensemble model improved significantly compared to the single model.For example,with the F1-Score of0.8825,Recall of 0.8779,and AUC of 0.9496 for the CatBoost model.Compared to the SVM model,the F1-Score,Recall,and AUC improved by 1%,1.4%,and 1%,respectively.The Hard Voting combination approach in the combinatorial model has improved F1-Score,Recall,and AUC by 0.13%,0.5%,and 0.04%,respectively,compared with CatBoost.The results show that using the combination model can make accurate judgments on customer churn.In exploring factors influencing customer churn,Internet security,contract type,monthly rent,monthly usage,and retention period had a more significant impact on customer churn.In contrast,gender,whether it was a paperless payment and whether it had a partner,had a minor effect on customer churn. |