Font Size: a A A

Design And Its Implementation Of Iterative Distributed Clustering Framework Based On Model Fusion

Posted on:2013-02-21Degree:MasterType:Thesis
Country:ChinaCandidate:T ZhangFull Text:PDF
GTID:2218330362960723Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Huge amount, stored in different locations, an increase in privacy related data are the new features of information nowadays. As the network bandwidth, privacy protection and stand-alone processing capacity is limited and other factors, it is difficult to gather whole information together for a unified clustering analysis. Thus, distributed clustering technology has become a research hotspot. However, existing distributed clustering algorithm or framework have several shortcomings: single-node algorithm must be modified according to distributed environment programming specifications, result fusion method is simple and sensitive to localized data distribution. Deal with these problems, this paper presents an iterative distributed clustering framework based on model-fusion.First, this paper describes the design principle of this iterative distributed framework based on model fusion. Framework has two main parts: the local clustering phase and the global optimization phase. The advantages includes: single-node algorithm need not to be modified, the iterative process avoids the impact of localized data distribution, meanwhile, the network bandwidth limitation and data privacy issues in distributed environment is resolved.Secondly, according to the proposed framework, this paper implements a distributed K-means algorithm, that is M-K-means. Meanwhile, M-K-means is compared with single-node K-means algorithm and distributed clustering algorithm which uses weighted mean approach for result fusion in experiment.Finally, M-K-means is extended to Hadoop cloud computing environment. Hadoop is not suited for iterative processing algorithms, the paper makes an optimization in the process of Hadoop, and optimized M-K-means is compared with Mahout K-means for analysis.According to the experimental results and analysis, the proposed algorithm framework not only can improve the efficiency, but also can increase the accuracy of the distributed clustering results to some extent. It also has a good performance in cloud computing environment. The framework has good practical results.
Keywords/Search Tags:Distributed Data Mining, Clustering, K-means, Hadoop
PDF Full Text Request
Related items