Font Size: a A A

Research On Key Techniques Of Large Scale Graph Mining Based On GPU Hybrid Architecture

Posted on:2016-12-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:B YangFull Text:PDF
GTID:1108330509460984Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Graph as one of the most basic data structures, in numerous bioinformatics, chemical data analysis, social network research and program bug detection, and many other application fields is used to model the complex relationship between objects. With the continuous development of these applications, the graph data mining as a key foundation of the application becomes more and more importance and its domain is quickly expanding. Because the rapid growth of graph size in these application and the high computational complexity of most graph mining algorithms, so the large scale graph mining urgently needs the support of high performance computing. Recently, due to the performance and energy efficiency advantages of GPU, it gradually has been widely applied in many general computing domain, and provides the opportunity for efficient large scale graph mining. For some important problems in the large scale graph mining such as graph traversal, graph analysis, subgraph isomorphism and frequent graph mining, we analysis the finegrain data level parallelism of the typical algorithm for the problems, and propose corresponding GPU based algorithms, and solve many technical problems in the design of GPU based fine-grain data level algorithms, and achieve the performance improvement of large scale graph mining. In this paper, the important contributions are as follows:1. GPU based large scale graph traversalThis paper proposes an improved vertex-frontier based GPU breadth-first search algorithm, which solve the performance bottleneck in the two stage of each iteration of the existing frontier based algorithm. The main work is as follow: In the neighbor collection stage, to alleviate the workload imbalance in GPU warp caused by task distribution method of the existing top-down algorithm, a virtual queue based task distribution method is introduced. For the limitation of the local deduplication method in existing algorithms, a novel global deduplication method for edge frontier is proposed. For some iterations of BFS on the scale-free or small world graphs, we propose a GPU bottom-up algorithm to avoid the costly deduplication and present a GPU hybrid method, which integrates the top-down method and bottom-up method. Experimental results show that our algorithm achieves the highest 3.2 times speedup compared with the state-of-the-art GPU algorithm.2. GPU based large scale graph analysisThis paper proposes GPU based betweenness centrality calculation algorithms. For the shortest path calculation and dependency accumulation stage of betweenness centrality calculation, to alleviate the workload imbalance of existing algorithm and avoid the utilization of atomic operation, we propose a frontier queue based method, which uses the virtual queue based task distribution method and global deduplication method. We propose a collection based approach to compute the number of shortest paths of each vertex, which avoid the data race among threads during computing the number of shortest paths. Then we propose an improved vertex-based parallel method. Finally, we propose a hybrid method, which efficiently integrates the advantages of the frontier based method, and the vertex-based parallel method. Experimental results show that our hybrid algorithm outperforms the state-of-the-art algorithm by 1.2-1.9 and has good scalability.3. GPU based large scale subgraph isomorphismThis paper first presents a GPU based on graph traversal based subgraph isomorphism algorithm, which uses region exploration to determine matching order and mainly consists of GPU region exploration and GPU subgraph matching. The GPU region exploration iteratively divides the region exploration into expanding and backtracking the partial subtree embedding by the graph traversal, and process these embeddings in parallel using GPU warps. The GPU subgraph matching generates the embeddings according the matching order obtained in parallel. Moreover, in order to improve its performance, a task-queue based method and the virtual-CSR graph structure are used to balance the workload among warps, and warp-centric programming model to balance the workload among threads in a warp. The prototype of GPUSI has been implemented, and runs comprehensive experiments of various graph isomorphism search on diverse large graphs. The results clearly demonstrate that the GPUSI has good scalability and can achieve speedups of 1.4–2.6 when compared to the state-of-the-art solutions.4. CPU/GPU heterogeneous platforms based large scale frequent subgraph miningThis paper proposes a CPU/GPU heterogeneous platform based frequent subgraph mining algorithm, and we efficiently exploit the coarse-grain and fine-grain parallelism of the g Span algorithm. Our work is mainly as follow: First, we propose a virtual queue based approach to solve the load imbalance during pattern expansion. Second, we propose two novel support computation method, which has lower time complexity for specific graph dataset than the existing method. Third, due the low degree of parallelism of minimum DFS code checking, we propose a CPU based coarse-grain parallel algorithm for minimum DFS code checking, and we present a pipeline strategy to hide the communication in CPU/GPU collaborative computing. Finally, we propose a GPU algorithm to perform embedding growth in parallel. Compared with the classical g Span algorithm and existing GPU g Span algorithm, our algorithm achieves about 17 times and 3.7 times speedup respectively.
Keywords/Search Tags:GPU, large scale graph mining, BFS, betweenness centrality, subgraph isomorphism, frequent subgraph mining, graph clustering
PDF Full Text Request
Related items