Font Size: a A A

Design And Implementation Of Parallel Peer Pressure Map Clustering Algorithm Based On Linear Algebra

Posted on:2018-09-11Degree:MasterType:Thesis
Country:ChinaCandidate:P G ZouFull Text:PDF
GTID:2358330542985206Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Large scale graph computation is becoming incredibly significant in many applications,such as,web search,computational biology,knowledge discovery and so on.However,with the advent of the Big Data Age,graph data grows massively,serial algorithms become inefficient to meet the graph-theoretic analyses demands of emerging“big data”applications.As a result,it is essential to speed up the underlying graph problems on current high performance computing system.However,large-scale graph computations are difficult to parallelize and achieve good performance due to the irregularity of the graph itself.High performance graph computing is an infant field,in sharp contrast with numerical scientific computing.How to achieve high performance in large scale graph computation is a problem need to be resolved.Due to the fact that the graph algorithms can be represented as a series of basic linear algebra operation,we can use traditional matrix computing technology to solve the graph problem effectively.We argue that lots of the tools of high performance numerical computing-in particular,data structures and parallel algorithms for computation with sparse matrices-can form the nucleus of a robust infrastructure for parallel computing on graphs.GraphBLAS(istc-bigdata.org/GraphBLAS)defines a set of graph primitives based on sparse matrices,and a series of graph algorithms can be implemented in many programming environments.A representative implementation of this approach is found in the combinatorial BLAS,which is essentially an inheritance and extension of the sparse BLAS.Graph clustering is a process of determining natural groups with high connectivity in a graph,which is widely used in pattern recognition,biological information and image analysis.The peer pressure clustering is a simple and efficient graph clustering algorithm that based on random walk.In this paper,we mainly focus on Combinatorial BLAS and Peer Pressure Clustering.In the first part,after briefly introducing the architecture and of Combinatorial BLAS and researching the performance of some related graph primitives supported by Combinatorial BLAS,we evaluate the performance of the BFS algorithm on the Dawning supercomputer.In the second part,we implement a scalable parallel algorithm for peer pressure clustering using the sparse matrix infrastructures in Combinatorial BLAS.Firstly,we represent the peer pressure clustering algorithm in a series of linear algebra operations.Secondly,we help to determine the proper algorithm and data structure for each basic linear algebraic operation.Thirdly,we implement the transformed parallel peer pressure algorithm based on MPI and design a parallel algorithm for the most time-consuming part of the algorithm,which were implemented based on both MPI and MPI-OpenMP in Combinatorial BLAS.On real instance,when the scale of graph which represented by a sparse matrix comes to 4300 billion,the parallel peer pressure clustering algorithm,based on linear algebraic,can achieve high performance and good scalability on dawning supercomputer.The parallel efficiency can up to 76.8%when the number of core scales to 1024 using MPI,and the parallel efficiency can up to 46.4%when the number of core scales to 2048 using MPI+OpenMP.The sparse matrices computing technology which involved in high performance computing can definitely form the nucleus of a robust infrastructure for parallel computing on large scale graphs.By decomposing the graph computing problem into serial sparse matrix operations involved in the numerical computing,we can represent the irregular data structure and access pattern in the graph computing in a structured format.After optimizing the transformed graph algorithm by the sparse matrices computing technology,we can implement parallel graph algorithms on the supercomputer computing environment with high performance.
Keywords/Search Tags:Graph computing, Parallel, Peer pressure clustering, Sparse matrix, Large scale graph, Combinatorial BLAS
PDF Full Text Request
Related items