Font Size: a A A

Research On Parallel Algorithm Of Graph Sequence Community Mining Based On MapReduce

Posted on:2013-02-15Degree:MasterType:Thesis
Country:ChinaCandidate:J TangFull Text:PDF
GTID:2248330362970892Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As an important branch of data mining, community mining of graph sequence is widelyapplied in practical problems such as social networks. It’s critical about how to obtain the valuableembedded information and how to accelerate algorithm on massive data sets. Current methodsalmost adopt the splitting algorithm based on tree-map record and bottom-up agglomerativealgorithm, and are static mining without consideration of time aspect. Based on coding cost, wepropose a graph sequence community mining algorithm GSCM, and propose GSCM-SC based onspectral clustering. With Hadoop Reduce parallel computation framework, we parallelizeGSCM-SC to propose the PGSCM parallel algorithm.Graph sequences in this paper are all binary. The paper firstly forwards the concept of graphsequence coding cost, and proposes the GSCM algorithm through optimizing this cost function.The algorithm makes balance between model complexity and community quality, so it candetermine the number of communities and their structures, not requiring any parameters. Initialstructure can be gained by clustering after information compression. Use the stochastic evolutionand selecting thought of genetic algorithm to avoid being trapped in local minima. It can timelydetect changes of community structure according to the coding cost fluctuation. And then validateGSCM via actual datasets.Based on spectral clustering, we propose GSCM-SC algorithm. By parallelizing performancebottlenecks, we propose parallel algorithm PGSCM. We parallelize the computing of similaritymatrix based on its independence of data points; parallelize the computation of Laplace matrixeigenvector using Lanczos method; parallelize K-Means based on independence of data points andcluster centers. And we construct a Hadoop system using several virtual machines to valid theeffectiveness of the algorithm and its performance promotion.At last, we introduce the gray aspect of graph sequence community mining, providing a goodresearch direction.
Keywords/Search Tags:graph sequence, community structure, cost function, MapReduce, parallelization
PDF Full Text Request
Related items