Font Size: a A A

Based On Imbalance Dataset Churn Customer Prediction Research

Posted on:2012-07-12Degree:MasterType:Thesis
Country:ChinaCandidate:Z YangFull Text:PDF
GTID:2218330338467190Subject:Systems Engineering
Abstract/Summary:PDF Full Text Request
Customer Churn Prediction is an instance of Data Mining applying in Customer Relationship Management (CRM). In Data Mining domain, the class imbalance problem has been one of main obstacles. Traditional Machine Learning methods set whole accuracy as the learning goal, then they are good at predicting the majority class, but weak in predicting the minority class. Customer Churn Prediction also faces the same problem, which is the class imbalance problem. For example, according to a survey, the customer churn rate is about 2% in telecommunication industry. If the entire customers are predicted as normal customers by a classifier, then the whole accuracy is 98%, but the prediction accuracy is 0% about the customer churn. Obviously, Customer Churn Prediction is like water off a duck's back. So, the class imbalance problem has been one of the main factors, which make against the Customer Churn Prediction. In service industry, Customer Churn Prediction has got an increasingly obvious attention. So it is more important to study how to conquer the class imbalance problem in Customer Churn Prediction.This paper begins with studying how to conquer the class imbalance problem in Data Mining domain, and then this paper studies how to conquer the class imbalance problem in Customer Churn Prediction from two ways:(1) balancing the primary dataset; (2) improving the traditional Machine Learning methods. Concretely, this paper has completed three aspects working.1. This paper studies RUS and SMOTE in Customer Churn Prediction, which are conventional sampling techniques in Data Mining domain. The experimental results reveal the two techniques are not always increasing the accuracy of the Customer Churn Prediction, even harming. Base on this result, this paper applies RERUS in Customer Churn Prediction. The experimental results reveal RERUS can effectively improve accuracy of Customer Churn Prediction.2. This paper applies the optimizing AUC method in Customer Churn Prediction, which is one of methods handing the class imbalance problem. AUC is main evaluation criterion about evaluating performance of the classifier classifying the imbalance data set. AUC synthetically evaluates the majority prediction accuracy and minority prediction accuracy without prejudice. The optimizing AUC method set the AUC as the learning goal to train model. At present, one of the main optimizing AUC methods is linear classifier utilizing gradient method to optimize AUC. But the gradient method is generally converged in local minima. So, this paper introduced the GA to optimize AUC method, and compared it with the previous one. The results of the experiment prove linear classifier based on optimizing AUC is not suit for Customer Churn Prediction when the data set is class imbalance.3. This paper applies weighted SVM and its improvement in Customer Churn Prediction. Weighted SVM firmly believe the BSV must be predicted wrongly, then Weighted SVM confirm the penalization parameters of positive class and negative class, according to the the ratio of positive class number and negative class number. Theory 5.3.1, brought forward by this paper, subverts the prerequisite of the weighted SVM. Base on this result, this paper bring forward the IWSVM, which utilizes the GA to optimize the parameters of SVM to get maximal AUC. Experiment results prove the IWSVM is better than weighted SVM and traditional method. So, IWSVM is suit for Customer Churn Prediction when data set is class imbalance.This paper was partially supported by a National Natural Science Foundation of China (No.70801021).
Keywords/Search Tags:Customer Churn Prediction, class imbalance, sampling, AUC, SVM
PDF Full Text Request
Related items