Font Size: a A A

Parallel Algorithms And Architectures For Graph Computations On FPGA

Posted on:2016-10-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:G Q LeiFull Text:PDF
GTID:1318330536467128Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Recently,as the FPGA grows dramatically in computing,storage and logic resources,FPGA-based reconfigurable computing has been becoming an important domain in high performance computing.Graph computations are the key components of ”Big Data” analysis and play an important role in ”Big Data” processing.FPGA-based custom computing systems have been one of best potential choices for accelerating these applications.However,there are some challenges to overcome: parallel algorithm design,parallelism exploring and limited processing scales.In order to deal with these challenges,this thesis emphasizes on the implementations of graph computations on FPGAs.The main works and contributions of this thesis are summarized as follows:(1)An FPGA-based parallel algorithm and implementation for large shortest path computations is proposed.Existing implementations of Single Source Shortest Path algorithm on FPGA are incapable of processing large scale problems efficiently by storing the graph and results using the on-chip memories.In order to exploit parallelism,a parallel algorithm to removed multiple vertices from the priority queue in each iteration is proposed,which is derived from the variant of Eager Dijkstra algorithm.In order to process large priority queue processing,we store the overflowed queue-elements using the off-chip memories and propose a scheme to get the off-chip queue-elements back to the internal priority queue to guarantee the correctness of priority queue processing.The experimental results on the real road networks show that our design can achieve speedup of5 X over the CPU implementation,and the power consumption is only 1/4 of the latter.(2)An FPGA-based parallel algorithm and implementation for large minimal spanning tree computations is proposed.Existing implementations of minimal spanning tree computations on FPGA lack of parallelism and are incapable of processing large scale problems.In order to exploit parallelism,a Prim-based parallel minimal spanning tree algorithm on FPGA is proposed.Several start vertices are chosen and to generate multiple subtrees in parallel with each subtree executing Prim algorithm.The subtree stops exploring while the vertex belonging to other subtrees has been checked.Then a new start vertex will be chosen to grow another subtree.All subtrees will be combined while all vertices have been checked.We also propose the priority queue implementations for large single subtree growing with Prim algorithm.The overflowed queue-elements are stored in the off-chip memories and read back while needed.The experimental results on the randomly generated graphs show that our design can achieve speedup of 2.58X-6.88 X over the sequential Prim implementation on CPU platform.(3)A message-passing parallel BFS algorithm and implementation for large scale graph exploring computations is proposed.To deal with the long communication latency of large scale parallel BFS algorithm,a novel 2D message-passing structure using onchip memories is proposed to reduce the communication latencies between processing elements.Compared with related work,our design reduces on-chip memory consumptions significantly and can be easily scaled to multiple FPGA systems.We also propose a distributed bitmap-based current/next queue implementation method which avoids the off-chip memory accesses for determining whether the vertex is in the current level or not.Limited by the long latency of the off-chip memory accesses,our design on a single FPGA shows lower performance than related work for both Random and RMAT graphs.However,our design is highly scalable and can be mapped to arbitrary number of FPGAs theoretically.(4)An FPGA-based parallel algorithm and implementation for large scale graph matching computations is proposed.Existing implementations on FPGA use on-chip memories to store the graph data and are incapable of processing large problem for bipartite graph maximal weighted matching.We propose a parallel algorithm which processes multiple unassigned vertices in parallel and propose an architecture for large bipartite graph matching processing by storing the graph data in the off-chip memories.Compared with related implementations,our design can process larger bipartite graphs.The experimental results on randomly generated graphs show that our design performs better than the CPU implementations.
Keywords/Search Tags:Field-Programmable Gate Array, reconfigurable computing, graph computations, Single Source Shortest Path, minimal spanning tree, Breadth First Search, bipartite graph matching
PDF Full Text Request
Related items