Font Size: a A A

Research, Design And Application Of Clustering Algorithm Using Mapreduce

Posted on:2014-02-21Degree:MasterType:Thesis
Country:ChinaCandidate:Y B SunFull Text:PDF
GTID:2248330395977612Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Clustering algorithm has been attracting much attention in the data mining research direction, by its ability to find dataset special distribution structure, without any prior knowledge. With modern network technology and industrial development in recent years, massive data quickly appears. The classical algorithm gradually unable to cope with the situation, the various distributed platforms, and the algorithm becomes a popular research direction.This article first summarizes the classic algorithm past and their current improvement research, and some recent clustering algorithm as well, which is hoped to provide some new perspective. Afterwards, four clustering mining algorithm have been adapt into MapReduce frame and running on the Hadoop distributed platform developed by Google.K-means algorithm assisted the basis of many other algorithms, but many of its own defects can not be avoided, the parallelization of the k-means++algorithm can effectively avoid the interference of the local solution.DBScan is one of the classical density-based algorithms. Its distributed version split the point space of the spatial structure, and overlapping coverage to replace the original algorithm clustering results.The Affinity Propagation algorithm is based on the similarity matrix processing, and convergence results by iterations gradually. Designing the distributed algorithm means that there is posibility to deal with large-scale high-dimensional data similarity matrix.Spectral clustering is a new direction, with symmetric similarity eigenspace data dimensionality, and with the completion of the k-means clustering results. The parallelization strategy also can be useful when parallelling the eigenvectors computing.Finally, through experimental and theoretical verify the feasibility of these algorithms has been proved. Meanwhile, Hadoop’s applications in the ordinary PC can be used to significantly reduce the computation time.
Keywords/Search Tags:Clustering algorithms, Hadoop, MapReduce, Cloud computing, Data-parallelcomputing
PDF Full Text Request
Related items