Research And Implementation Of Mapreduce-based Graph Clustering Algorithm

Posted on:2013-09-10

Degree:Master

Type:Thesis

Country:China

Candidate:G X He

Full Text:PDF

GTID:2248330395985445

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

In order to get effectively the information we need from the mass data,clustering analysis is particularly important, but we also facing a great challenge. In the face of discrete, various and multi-dimensional mass data, the clustering effect of some traditional clustering algorithm based on the distance is unsatisfactory, however, graph-based clustering algorithm which suit the discrete attributes is higher computational complexity and difficulty. Due to Hadoop distributed parallel computing has great performance in deal with data sampling and analyzing, this thesis a scalable distributed graph clustering algorithm method which use Hadoop platform, that realize the graph clustering algorithm method under MapReduce model,and using this method, designed and implemented the clustering algorithm of the MST(Minimum Spanning Tree) based on MapReduce.The main job of the thesis are as follows:The detail analyzed of MapReduce module, and understood some algorithm related, analyzed the current research situation, and did some analysis about clustering analysis and its research, then after that, we pointed out the clustering algorithm of the MST based on MapReduce.The method can deal with the query of clustering algorithm based on distance and also solved effectively the computational complexity of graph clustering algorithm,this method has to a large extent helped raise efficiency of mass data clustering, meanwhile, from analyzing and disposing mass data view this method will play no small role.This thesis has detailed analysis the parallelization implementation of the MST clustering algorithm.In addition,this paper is focused on how to realize clustering algorithm under MapReduce module.Under MapReduce module,the subjects each implementation in text feature vector extraction, graph construction and MST construction etc, and we optimization design about the algorithm related to the characteristics of the weights, similarity measure and realize of MST clustering. At last, the thesis analyzes and compare the function of clustering algorithm through the experiment and looks forward to the prospect and the possible function extension of the newly developed to this method.

Keywords/Search Tags:

mass data, Hadoop, MapReduce, distributed parallel computing, clustering, graph clustering algorithm, MST

PDF Full Text Request

Related items

1	Parallel Clustering Algorithm Based On MapReduce
2	Research And Implementation Of Parallel Clustering Algorithm Based On Approximate Spectrum Hadoop MapReduce
3	Research, Design And Application Of Clustering Algorithm Using Mapreduce
4	The Research Of Clustering Algorithm Based On Hadoop Cloud Computing Platform
5	Accelerating Clustering Algorithm On The Cuda Graphics Processor
6	Research On Distributed Fast Clustering Algorithm Based On Mapreduce
7	Design And Implementation Of Clustering Algorithm For Large Scale Chinese Short Text Based On Mapreduce
8	Research On Parallel Clustering Algorithm For Large - Scale Data Set
9	The Research Of Parallel Clustering Algorithm Based On Hadoop Platform
10	Design And Implemention Of High Performance Text Clustering Algorithm Basic On Hadoop