Font Size: a A A

Research And Implementation Of Mapreduce-based Graph Clustering Algorithm

Posted on:2013-09-10Degree:MasterType:Thesis
Country:ChinaCandidate:G X HeFull Text:PDF
GTID:2248330395985445Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In order to get effectively the information we need from the mass data,clustering analysis is particularly important, but we also facing a great challenge. In the face of discrete, various and multi-dimensional mass data, the clustering effect of some traditional clustering algorithm based on the distance is unsatisfactory, however, graph-based clustering algorithm which suit the discrete attributes is higher computational complexity and difficulty. Due to Hadoop distributed parallel computing has great performance in deal with data sampling and analyzing, this thesis a scalable distributed graph clustering algorithm method which use Hadoop platform, that realize the graph clustering algorithm method under MapReduce model,and using this method, designed and implemented the clustering algorithm of the MST(Minimum Spanning Tree) based on MapReduce.The main job of the thesis are as follows:The detail analyzed of MapReduce module, and understood some algorithm related, analyzed the current research situation, and did some analysis about clustering analysis and its research, then after that, we pointed out the clustering algorithm of the MST based on MapReduce.The method can deal with the query of clustering algorithm based on distance and also solved effectively the computational complexity of graph clustering algorithm,this method has to a large extent helped raise efficiency of mass data clustering, meanwhile, from analyzing and disposing mass data view this method will play no small role.This thesis has detailed analysis the parallelization implementation of the MST clustering algorithm.In addition,this paper is focused on how to realize clustering algorithm under MapReduce module.Under MapReduce module,the subjects each implementation in text feature vector extraction, graph construction and MST construction etc, and we optimization design about the algorithm related to the characteristics of the weights, similarity measure and realize of MST clustering. At last, the thesis analyzes and compare the function of clustering algorithm through the experiment and looks forward to the prospect and the possible function extension of the newly developed to this method.
Keywords/Search Tags:mass data, Hadoop, MapReduce, distributed parallel computing, clustering, graph clustering algorithm, MST
PDF Full Text Request
Related items