Font Size: a A A

Research On The Cloud-based Parallel Clusteringalgorithm

Posted on:2012-09-30Degree:MasterType:Thesis
Country:ChinaCandidate:Y XiFull Text:PDF
GTID:2218330338463120Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology and database technology especially the popular use of the World Wide Web, the amount of data that need to be analyzed and managed is increasing rapidly. Data mining then appears to satisfy the demands. Clustering algorithm is one of main methods in data mining, and deeply research on how to improve the performance of it is of great significance.As the hotspot of recent study, cloud computing is the development of grid computing, parallel computing and distributing computing. Using cloud computing technology, people can easily obtain computing power, storage capacity and infrastructure through network, and can analyze and manage massive data effectively, and not only reduce the requirement of terminal equipment but also improve the ability of data processing.This thesis mainly studies how to utilize the parallel computing ability of the cloud cluster system to solve the clustering problem of the massive data processing. Firstly, after emphatically analyzing the DBSCAN algorithm, this thesis presents a hierarchy algorithm HDBSCAN. This algorithm can not only correct the problem of poor clustering results caused by inappropriate choice of the input parameter Eps, shield the sensitivity of input parameter; but also reduce the number of queries and I/O overhead because of not detecting each point. Then this thesis constructs cloud environment by Hadoop and combines HDBSCAN algorithm with MapReduce programming model. Finally, this thesis tests and compares the functionality and performance of the algorithm in the cloud computing environment. The results have shown that deploying HDBSCAN algorithm in the cluster can improve the efficiency of clustering.This thesis has done some beneficial research for the clustering algorithm based on cloud computing.
Keywords/Search Tags:Density-Based Clustering, Hierarchical Clustering, Cloud computing, MapReduce
PDF Full Text Request
Related items