Font Size: a A A

Research And Application Of MapReduce Clustering Method

Posted on:2019-06-27Degree:MasterType:Thesis
Country:ChinaCandidate:W ZhangFull Text:PDF
GTID:2348330569988251Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Clustering algorithm is an important method in data mining,but with the data quantity increases sharply,the traditional single machine clustering algorithm cannot satisfy the needs of real-time data processing.How to implement clustering algorithm in the distributed cluster environment has become a hot research topic.In clustering algorithm,K-means algorithm is widely applied because of its simplicity and efficiency.However,the algorithm has the shortcoming of random selection of the initial cluster center and Clustering number depends on subjective experience.These two points are also the key points for its improvement and optimization.This paper focuses on the implementation of the K-means algorithm in the distributed cluster,and optimizes its shortcomings to improve the accuracy and efficiency of the algorithm.Aiming at the shortcomings of K-means algorithm,a distributed Canopy-Kmeans clustering algorithm is implemented.The algorithm obtains the initial clustering center and K value of K-means algorithm by rough clustering algorithm of Canopy algorithm,and reduces the iteration number of K-means algorithm.Experiment proves that the algorithm effectively reduces operation time.The parameters of the Canopy-Kmeans algorithm rely mainly on many tests,an improved hash based K-means algorithm is proposed.The algorithm optimizes the original algorithm by the maximum and minimum distance strategy,so that the initial clustering center is closer to the real clustering center.The experimental results show that the accuracy and efficiency of the algorithm are better than other algorithms.Finally,the two algorithms are applied to the data analysis of the real civil aviation ticket agent.The experiment proves that the algorithm is effective and feasible,and has practical value.It can provide a basis and help for airline management and sales plan making.
Keywords/Search Tags:Airline ticket agent, clustering algorithm, distributed cluster, MapReduce
PDF Full Text Request
Related items