Font Size: a A A

The Research Of Spatial Clustering Analysis Based On Cloud Computing

Posted on:2013-08-27Degree:MasterType:Thesis
Country:ChinaCandidate:G C ZhaoFull Text:PDF
GTID:2248330377458327Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Spatial clustering analysis which is an important research direction of the cluster analysisdivides the spatial data sets into classes of similar data objects, and the same class has ahigher similarity between the kind of data objects, rather than the different class. It is not onlyan important method of spatial data mining, and a prelude to other mining tasks.Nowadays, in the era of rapid development of information technology, the data trendsdiversification, mass and high dimensional. Against the large amounts of spatial information,how can quickly and accurately extract the implicit useful knowledge to guide practice, isincreasingly becoming the urgent needs of the human. But cluster analysis will be faced withthe memory capacity, CPU processing speed and other bottlenecks on a stand-alone serial PC,it is difficult to meet the actual demand. Virtualization and parallel technology provide a goodsolution.Cloud computing which is a research focus at home and abroad, is the development ofgrid computing, parallel computing and distributed computing, its ideology is developedfrom the basis of parallel computing and storage ideas. It can effectively solve the problemsfaced in the analysis and processing of massive data, and it provides strong support for theclustering of massive high-dimensional spatial data analysis. In particular, Google’sMapReduce distributed programming model makes the clustering algorithm parallelizationbecoming more simple and reliable.This article deeply studies the spatial clustering algorithm with cloud computingtechnology of the HDFS (the Hadoop Distributed File System) and MapReduce, and achievethe K-Medoids and PGDC algorithm based on MapReduce model. Then simulating theimproved algorithm, and conducting in-depth analysis of the experimental results. This paperinclude the following aspects:1) Study the spatial clustering algorithm, and analyze the basic principles and advantagesand disadvantages of the various clustering algorithms.2) Study the idea of parallelization and cloud computing and its key technologies.Analyze the spatial clustering algorithm under cloud computing environments, and combinethe clustering algorithm with the MapReduce programming model. Study the combination ofparallel clustering model. In the analysis of the K-Means and the Canopy-K-Means, such as parallel algorithm based on the comparison based on the Hadoop platform, proposed animproved K-Medoids and PGDC parallel algorithm based on the grid density, and carry out toachieve.3) Simulate the parallel clustering algorithm. We analyze the results of the experimentsfrom the validity of the algorithms, the speedup of the algorithms and the scalability of thealgorithms respectively.Then using grain storage site model as an example, We take apractical application test on the algorithms. The experimental results show that the proposedparallel clustering algorithm to efficiently get a better clustering results in a large number ofdata sets with a strong storage capacity and computing speed, high utility in the practicalapplication of them and scalability.
Keywords/Search Tags:Cloud computing, MapReduce, Spatial Clustering, K-Medoids, PGDC
PDF Full Text Request
Related items