Font Size: a A A

Research And Implementation Of Loss Model Of Frequent Flyers Based On Spark

Posted on:2018-02-19Degree:MasterType:Thesis
Country:ChinaCandidate:L J LuFull Text:PDF
GTID:2348330533466795Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of tourism and business,the value of aviation data is bigger.The value of aviation data lies largely in the analysis of the characteristics of the aviation flyers' behaviors,while the characteristics analysis can help judge the loyalty of members and predict whether they will outflow.The purpose of this paper is to construct an effective erosion model of the frequent-flier data of China Southern Airlines and completely display that model.Nowadays,the volume of aviation data is getting much larger as the development of scientific technology,meanwhile,the technology of data mining based on aviation big data is also growing much faster.However,those commonly used standalone machine learning algorithms or standalone data-mining softwares can hardly support the analysis of mass aviation data,which leads to the appearance of Hadoop's MapReduce—a distributed computing framework.Moreover,because of the superior performance,higher development efficiency than Hadoop,and the compatibility to Hadoop itself,Spark has become the best choice of all the parallel computing frameworks,as for this essay.Furthermore,on account of the ineffective imperfection of a single algorithm model,this essay resorts to the stacking integration of several single model,and merges it into the source code of Spark as a general interface for public.Lastly,this essay realizes the combination of mining models and display system,which together fulfill the display of analysis results with diverse charts and sheets in the form of websites.The system includes two parts—offline model and online display system.Based on the business analysis of loss model of China Southern Airlines,the offline model follows the process of data mining—analyzing original data,preprocessing,constructing the engineering of characteristics,analyzing characteristics,choosing from characteristics,training models,predicting,evaluating the models and combining models.After that,clustering analysis is raised based on the members who are predicted to be outflew,which helps more clearly analyze the characteristics of those possibly outflew members.With report technology,the web system is able to show analysis results of offline model and provide download service to administrators.The last part of this essay gives a systematic test of both function and performance of the entire system.The results indicate that the loss model can efficiently predict who will be outflew,and offer a direct and clear instruction of them.To sum up,applying parallelized and strengthened machine learning techniques into real mass operational data analysis,with C/S structure to give out a user-friendly display of analysis results,conforms to the trend of the development of information industry and meets its requirement.
Keywords/Search Tags:Spark, Machine Learning, Model Ensembling, Report Technique
PDF Full Text Request
Related items