Font Size: a A A

The Parallel Algorithm Based On Hadoop In The Study Of The Prediction Of Customer Churn

Posted on:2018-10-12Degree:MasterType:Thesis
Country:ChinaCandidate:X HuangFull Text:PDF
GTID:2348330533958995Subject:Control engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the communications industry,the number of users has experienced a period of explosive growth.At present the increase of new customers has slowed.Due to the fierce competition among operators,operators need consider how to reduce the effect of customer churn rate to themself.Therefore,how to predict the customers who are likely to be lost based on historical data and take effective retention measures become the urgent needs of operators.Predicting the customers who have a tendency to be lost is a binary classification problem.The classification algorithm used in this paper is Support Vector Machine——SVM.SVM has a good generalization ability in the binary classification task.But the customers who have a tendency to be lost in the entire of the operator only accounts a small part.which is unbalanced data.It will bring certain difficulties for traditional classification algorithms.Classification results will lean to the class which has more quantity.Therefore,the following research in this paper will improve the SVM algorithm and make it suitable for unbalanced data.In order to adapt to the future of a mass of data processing,this paper paralleled the above algorithm based on the framework of MapReduce on Hadoop platform.SVM classifies the data by creating a linear boundary.When the data linearly inseparable,SVM uses the kernel function to map the data from low dimension to high dimensional space,so that the data becomes linearly separable.However,in the face of unbalanced data,that is the research data in the entire data accounts for a small case,SVM will lean to the class which has more quantity and the performance is not very good.For this problem,this paper proposed the DE-C-SVM algorithm.It combines the cost-sensitive algorithm to assign different penalty factors to different categories and gives higher penalty factor to the minority class which is misclassification.It minimizes the global misclassification as the target.Then it uses the differential evolution algorithm to optimize the penalty factor and kernel functionparameters to improve the classification performance of the algorithm.This paper verified the effectiveness of the proposed algorithm on eight unbalanced data of UCI data.Next,this paper paralleled the algorithm.Then it verified the scalability under the condition of Hadoop platform and single platform and verified accelerated experiment on Hadoop platform.The experimental result shows that the algorithm on Hadoop platform can improve the efficiency of data processing.Finally,this paper builded a prediction of customer churn model based on Hadoop platform.Customer data is selected from an operator and applied to the prediction of customer churn model after preprocessing customer data.The experimental result shows that the model can get a good prediction effect at the same time improve the efficiency of data processing.It can improve operator's policy efficiency and has important practical significance for daily operation of operators.
Keywords/Search Tags:Customer Churn, Unbalanced Data, SVM, Differential Evolution, Hadoop
PDF Full Text Request
Related items