Font Size: a A A

The Empirical Research Of Shanghai Telecom Customer Churn Problem Based On Data Mining

Posted on:2016-11-04Degree:MasterType:Thesis
Country:ChinaCandidate:X XieFull Text:PDF
GTID:2309330479985402Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
Data mining is more and more widely used in the telecom industry, so is the mining method gradually. The depth of excavation is also gradually deepened. Enterprises attach importance to precision marketing base on data mining. A large outbreak of the amount of data makes data mining becoming an important measure for the enterprise to compete for the market. This paper studies the methods of data mining in the telecommunications industry under this background. This paper studies pretreatment technology, classification algorithm based on high dimensional and imbalanced data, clustering technology of customer churn problem and the reasons for the customer churn using R language.Dealing with high dimensional unbalanced telecom data and establishing customer churn classification model is the focus of this article. High dimension and imbalance of telecom data is one of the important reasons for causing unstably model and lowly accuracy rate. And this problem has been troubling data mining engineer. How to study the high dimensional telecommunications industry unbalanced data become the main content of this paper. This paper focuses on the research of Bagging, Ada Boost as well as Random Forest algorithm. The results show that the coverage rate of Ada Boost classification model is 6% higher than that of Bagging model and the error rate decreased from 86.96% to 39.64% and optimal model is obtained after Rand Forest model is optimized.The customer churn problem has been one of the most important issues for telecom enterprise. The loss of customers will cause great losses to the enterprises. Therefore, in addition to predict the losing users, this paper also focuses on the characteristics of customer churn index and comparative analysis of the difference between the churn-customers and non-churn-customers in the communication index. It also analyses which communication index play the key role to the churn-customers. At last,it analyses the reasons of customers’ loss from the point of users’ package series, providing direction and strategy for the subsequent model research.
Keywords/Search Tags:Data Mining, Clustering Algorithm, Integrated Algorithm, Telecom Data, R Language
PDF Full Text Request
Related items