Font Size: a A A

Research On Parallelization Of Classification Algorithm Based On Spark Platform In Telecom Customer Churn Prediction System

Posted on:2016-04-05Degree:MasterType:Thesis
Country:ChinaCandidate:C W LeiFull Text:PDF
GTID:2348330479954687Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Classification techniques of data mining are widely applied to predict customer churn in the telecommunications fields. However, the traditional classification algorithm mostly runs on a single machine and it can complete classification task quickly while poorly and inefficiently when facing ma ssive telecom data due to limited resources. How to improve the efficiency of classification algorithm dealing with massive data is an urgent problem needed to be solved in telecom enterprises.Based on the project of telecom customer churn prediction system, the problem that how to predict customer churn in telecom enterprises is analysed and researched, as well as the demand analysis and functional design. With research on various data in telecom, the customer information model is built on the basis of static data and dynamic data of telecom customers, which is used as input data model of telecom customer churn prediction system. In addition, Spark platform and classification algorithms are researched to analyse the method to implement parallel classification algorithms on Spark platform, which to be applid in telecom customer churn prediction system.In the telecom customer churn prediction system, parallel softmax regression algorithm and distance-based classification algorithm are designed and implemented, as well as packed and tested on Spark cluster and Hadoop cluster.The experiment result shows that softmax regression algorithm and distance-based classification algorithm implemented on Spark platform have great speed ratio in the face of huge amounts of data, with better parallelization performance and efficiency compared with Hadoop platform. With it, the stability and reliability of telecom customer churn prediction system can be greatly improved.
Keywords/Search Tags:Massive data, Classification Algorithm, Spark, Softmax Regression Algorithm, Distance-based Classification Algorithm
PDF Full Text Request
Related items