Font Size: a A A

The Research Of Clustering Algorithm Based On Hadoop Cloud Computing Platform

Posted on:2015-04-19Degree:MasterType:Thesis
Country:ChinaCandidate:Z F YanFull Text:PDF
GTID:2308330485990658Subject:Integrated circuit engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of mobile Internet, data grows explosively. Time has come for the "big data". How to accurately and effectively analysis existing data, maximally excavate its value has become a common problem. The traditional data analysis method obviously unable to effectively deal with large data, there is an urgent need for a new data processing method.Cloud computing technology arises at the historic moment. Cloud computing, represents a variety of technology and business model. MapReduce which was originally proposed by Google company technology is an important branch of cloud computing technology. MapReduce computing framework is a kind of parallel computing model, compared with the traditional ideas of parallel computing, MapReduce can greatly simplify the development of parallel programs.Clustering analysis algorithm in data mining field has been the important content. K-means algorithm is common in clustering analysis algorithm, compared with other algorithms has the advantage of easily implemented and often can obtain a satisfactory result.Under the background of cloud computing technology, this paper attempts to use MapReduce technology to deal with the analysis of clustering algorithm. This paper analyzes the calculation principle of MapReduce framework, and focuses on the parallel K-means algorithm based on the technology of MapReduce and the implimentation of the Canopy-Kmeans algorithm. In order to validate the ideas of the MapReduce and the effectiveness of the proposed parallel K-means algorithm, this paper builds the Hadoop platform and carry out several groups of experiments. Final results show that the parallel K-means algorithm and the Canopy-Kmeans algorithm has good speedup and scalability. And, the Canopy-kmeans algorithm is more accurate than k-means algorithm, at the same time, faster speed.
Keywords/Search Tags:MapReduce, Clustering, K-means, Canopy, Parallel algorithms
PDF Full Text Request
Related items