Font Size: a A A

Research On Mobile Customer Churn Based On Data Mining

Posted on:2008-08-31Degree:DoctorType:Dissertation
Country:ChinaCandidate:G Y LiuFull Text:PDF
GTID:1118360212997640Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Following the opening and growing-up of the telecommunication industry and the coming of the 3G, the competitions among the telecommunication companies are getting critical. The database systems in the telecommunication companies provide the possibilities to implement the data mining for the research. How to keep the customers is the core issue of the telecommunication companies. The research made by Harvard University illustrates that 5% drop of the consumption of the users could affect the investors'confidence on the gains of the companies. Due to the higher costs to attract new customers, how to keep the current customers is important for the telecommunication companies. Normally, churn without any clue is the headache and out of control for the telecommunication companies. Once the customers decide to leave the companies, it is difficult to persuade them to retain the companies even with better plans. Data mining improves the capabilities of the companies for the prediction and the control of churns significantly. Companies could create models by using data mining tools based on the customers'personal information, calling histories, and churn information. Based on the prediction of churn obtaining from the models, salesman will develop more active and objective methods to keep the current customers than before.The prediction of churn plays a major role in the analysis and operation system in the telecommunication companies. Based on the prediction model of churn, companies will find the customers with higher possibilities of churn, find effective ways to keep the customers to retain in the companies. Therefore, the research on the churn prediction is significant for the telecommunication companies to reduce the operation cost and obtain more gains.Based on the current situation of the telecommunication industry and the analysis of the theory, technology, and application of data mining, the dissertation prompts the genetic and chain algorithms to predict the churn and creates the prediction model for the churn. The major research and devotion are the followings:1. Analyses the technology of data mining. Includes the processes of data mining, the methods of data mining, and the application of data mining in the telecommunication industry. Prompts the prediction methods, analysis of customers'characters, identification of major customers, churn prediction, and identification of users groups.2. Analyses the disadvantages of the current researches, prompts the methods, such as discretion of continuous data, simplification of characters, etc. for the application of data mining to the prediction of churn. Evolutionary computation needs discretion of continuous data. In the preprocessing of the customer information, some data are value data. They need to be discretion, group, and transfer into category data. The discretion of the continuous data is the pre-procession of value data into category data with the same distribution rules as the original ones. It is important for the whole processes of prediction by application of ECCA algorithm in the dissertation. Based on K-means algorithm, the dissertation prompts an algorithm of self-organization distribution to fulfill the discretion of continuous data. It solves the problem in K-means that setting the value of K could affect the results of categorization. The algorithm of self-organization distribution is made up of m neurons. m is the maximum number of groups. The value of characters is the input to the algorithm. In the process of self-organization, the weight of the neurons keeps on updating and getting close to the real category of characters until no more neurons have been updated and finishes the discretion. The characters are huge in the database warehouse of telecommunication industry. Some characters are tightly related. And some superfluous features exist in the discretion database. In order to improve the effectiveness of algorithm, feature selection is necessary for the database to obtain a minimum group of characters with the same attributes as the original ones. For the higher dimension data, the time spending on the data mining and data analysis is an exponential function of the dimension of data. So it is necessary to apply the proper feature selection to reduce the dimension of data but with the same attributes as the original ones. The dissertation promptsχ2 statistics as the measurement of correlation among features. From the theχ2 table, obtains the independent level of confidence a. For a subset of characters, obtains two lists Listf,c and Listf,s based on a. Listf,c is the list of descendent correlation between class and features. Listf,s is a descendent list of correlation between reference features and features. Selecting the features based on the potential difference in two lists for a specific character, the dissertation prompts a feature select algorithm FSBPD. The algorithm takes off the superfluous features which are useless for the decision from the data but keeps the same attributes as the original ones. At the end, the dissertation analyses the theory of the algorithm and provides the experimental results. The experimental results show the algorithm of FSBPD has a sound capability of feature selection.3. Concludes the methodology of evolutionary computation. Evolutionary computation simulates the mechanism of survival of the fittest in the processes of biological evolution and the transmission rules of the genetic information. The dissertation introduces the major branches and the mathematical background of the evolutionary computation. Because the evolutionary computation is perfect to solve the optimum problem, the dissertation prompts the ECCA model for the churn prediction based on it. ECCA model starts the searching from a group to obtain a global optimum instead of a local optimum. ECCA model includes the basic processes of evolutionary computation. The quality of output coming from the first layer rules will seriously affect the prediction of the whole model. Therefore, based on the output of the first layer rules obtained from the probability induction and the traditional generic computation, the dissertation combines the background knowledge, divides the characters into two distinctive categories, creates the first layer rules inside each category without crossing, and creates the crossing rules between categories. From and on the second layer, in order to find the potential rules, does not limit the cross inside each category. By that way, repetition continues until no more new and valuable rules to be created. After the whole rules have been created, ECCA model will code the whole rules into an expression. The experimental results show that ECCA model has a better predication capability with the higher global category results than C4.5.4. For the customers'history and temporary data, the dissertation prompts the algorithm of chain data mining, combines the decision tree, and creates a combined chain-tree classifier (CTC). Creates the model to predict the customer churn, simulates, and compares the experimental results.5. For the issue of churn causing from the new policies of the competitors, the dissertation prompts a prediction model based on the competition. Compares the effect of different calling plans inside the company and among the competitors, predicts the churn, and compares the experimental results.
Keywords/Search Tags:Data Mining, Prediction of churn, Sequential Patern Mining, Evolutionary Computation, Feature Selection, Classification
PDF Full Text Request
Related items