Font Size: a A A

Research On Air Customer Segmentation Based On Spark Platform

Posted on:2018-01-22Degree:MasterType:Thesis
Country:ChinaCandidate:C C ZhangFull Text:PDF
GTID:2348330536477767Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
With the aviation field of e-commerce continue to carry out,the airline's information systems haveprecipitated near-massive data,using large data parallelization platform to analyses these data efficiently and diging out valuable information will be very meaningful.Customer segmentation is the basis for targeted marketing activities by airlines,which can identify the customer groups better and manage the customer for greater profitability for the company effectively.But most of the current aviation customer segments are only based on experience and statistics,which are the extremely simple division methods,it can not identify the customer characteristics effectively and support enterprises to make reasonable marketing decisions.Based on the Spark platform,this paper studies the customer segmentation of aviation,uses the real data of China Southern Airlines customers,combines the business indicators,establishes the subdivision model suitable for the airline customers,So that providing reference for airlines using passenger datas to analyze the customer characteristics and make effective strategic decisions.For near-massive passenger information data,the conventional stand-alone approach is incompetent.This paper put forward to using the multiple machines to build Spark large-data distributed parallel processing platform,with HDFS distributed file system for data storage,and using the flexible distributed data set RDD to complete the data processing.Aiming at the problem of K-Means cluster algorithm in K value and initial points selection,this paper proposes to preprocess the aeronautical customer data by using Canopy rough cluster algorithm,and then carry out K-Means optimal cluster algorithm for Canopy results.The experimental results show that,the optimized algorithm has higher efficiency and accuracy than the traditional K-Means algorithm.In the article,the first chapter is the introduction,the second chapter is the theoretical basis and technology,the third chapter prepares the original data of the aeronautical customers and constructs the characteristic.The fourth chapter is the parallelization of the K-Means algorithm under the Spark platform.The fifth chapter is the optimization of K-Means algorithm and parallelization in the Spark platform,the sixth chapter is about the experiment and do analysis to the results.
Keywords/Search Tags:aviation customer segmentation, cluster analysis, K-Means, Canopy, Spark
PDF Full Text Request
Related items