Design And Its Implementation Of Iterative Distributed Clustering Framework Based On Model Fusion

Posted on:2013-02-21

Degree:Master

Type:Thesis

Country:China

Candidate:T Zhang

Full Text:PDF

GTID:2218330362960723

Subject:Computer software and theory

Abstract/Summary:

Huge amount, stored in different locations, an increase in privacy related data are the new features of information nowadays. As the network bandwidth, privacy protection and stand-alone processing capacity is limited and other factors, it is difficult to gather whole information together for a unified clustering analysis. Thus, distributed clustering technology has become a research hotspot. However, existing distributed clustering algorithm or framework have several shortcomings: single-node algorithm must be modified according to distributed environment programming specifications, result fusion method is simple and sensitive to localized data distribution. Deal with these problems, this paper presents an iterative distributed clustering framework based on model-fusion.First, this paper describes the design principle of this iterative distributed framework based on model fusion. Framework has two main parts: the local clustering phase and the global optimization phase. The advantages includes: single-node algorithm need not to be modified, the iterative process avoids the impact of localized data distribution, meanwhile, the network bandwidth limitation and data privacy issues in distributed environment is resolved.Secondly, according to the proposed framework, this paper implements a distributed K-means algorithm, that is M-K-means. Meanwhile, M-K-means is compared with single-node K-means algorithm and distributed clustering algorithm which uses weighted mean approach for result fusion in experiment.Finally, M-K-means is extended to Hadoop cloud computing environment. Hadoop is not suited for iterative processing algorithms, the paper makes an optimization in the process of Hadoop, and optimized M-K-means is compared with Mahout K-means for analysis.According to the experimental results and analysis, the proposed algorithm framework not only can improve the efficiency, but also can increase the accuracy of the distributed clustering results to some extent. It also has a good performance in cloud computing environment. The framework has good practical results.

Keywords/Search Tags:

Distributed Data Mining, Clustering, K-means, Hadoop

Related items

1	Design And Its Implementation Of Iterative Distributed Clustering Framework Based On Model Fusion
2	Study On Key Techniques Of Distributed Data Mining Based On Hadoop
3	The Research And Design Of Distributed Data Mining System Based On Hadoop
4	Research Of Clustering Mining Algorithm Oriented Big Data
5	Research On Clustering Algorithms In Data Mining
6	Design And Implementation Of Distributed Clustering Framework Based On Model Fusion
7	A Research And Implementation With Improved K-Means Clustering Algorithm To Image Retrieval System Based On Hadoop Platform
8	The Research And Application Of Security Log Clustering Mining Algorithm Based On Hadoop Platform
9	Research And Implementation Of Distributed Clustering Algorithm Based On Hadoop Platform
10	Research On Mining Taxi Pick-up Hotspots Area Based On Big Data Hadoop Platform