Font Size: a A A

Implementation And Optimization Of Graph Algorithms On GPGPU

Posted on:2017-06-03Degree:MasterType:Thesis
Country:ChinaCandidate:P F LiFull Text:PDF
GTID:2428330569498527Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Today,the use of general purpose graphics processing units(GPGPUs)for high-performance parallel computing has become increasingly popular.GPU's high computational throughput and high bandwidth make it an ideal platform for accelerating large-scale data parallel applications.Many applications,such as multimedia,data mining and bioinformatics,have achieved higher performance and energy efficiency than traditional CPUs on GPU,but most of these applications have the data structure and parallel access mode of "friendly" rules for GPU.Graph processing algorithms are the basis for many applications.Many problems in high-performance computing and business applications are represented in graph,such as incomplete LU decomposition,electronic design automation,data mining,Web analysis,and social network analysis.Because of the characteristics of the graph,there are many challenges in accelerating the graph algorithm on the GPU: the graph algorithm is input-dependent and unpredictable,showing significant thread divergence and access divergence on the GPU,and the load imbalance between threads is also a common challenge.In order to solve these problems,this paper research implementation and optimization of graph algorithms on the GPU.The main contributions are as follow:1)A scalable and efficient graph coloring algorithm is implemented on the GPGPU.The algorithm is based on the greedy algorithm of which the coloring quality is close to the optimal solution.When storing the colors cannot be assigned by vertices,we design a bit vector data structure,using a single bit to represent a color.Compared to the Boolean type,using bit vector can reduces storage overhead,improves data reusability,and reduces access to the high-latency local memory.The algorithm improves the performance of the access memory by buffering the read-only data cache in response to the read-only feature of the graph.The experimental results show that the performance of this algorithm is 2.3 times of that of NVIDIA CUSPARSE library routine and needs less colors,2.6 times of that of the serial First-Fit algorithm.At the same time,this implementation scales well for sparse graph.2)An efficient strong connected component decomposition algorithm is implemented on the GPGPU.This algorithm combines the topological structure of the real-world graphs,uses the FB algorithm to compute the giant strongest connected component in the graph firstly,with load balancing;then uses Coloring algorithm to perform a color propagation,which divides the remaining vertices into more and smaller sub-graphs,finally performs FB decomposition in parallel for each sub-graph.Through the hybrid algorithm,the parallelism has improved,and the computing resources of the GPU are fully utilized,the number of iterations is reduced,and the solving process is accelerated.The experimental results show that the performance of this algorithm is 5.4 times that of the serial Tarjan's algorithm,1.4 times of the OpenMP implementation of Hong et al,40 times of that of Stuhl's CUDA implementation.
Keywords/Search Tags:GPGPU, Graph Algorithms, Graph Coloring, Strongly Connected Component Decomposition
PDF Full Text Request
Related items