Research On Key Techniques Of GPU-Based Stream Graph Computing

Posted on:2021-03-28

Degree:Master

Type:Thesis

Country:China

Candidate:Z Y Fang

Full Text:PDF

GTID:2428330611455054

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

In the 21 st century,people are ushering in the era of big data.With the rise of the Internet and cloud computing,especially the combination of mobile devices and the Internet,everyone is generating and consuming large amounts of data every moment.For service providers,how to process these data and obtain valuable information from these data becomes a difficult problem.In order to solve this problem,many big data processing frameworks have been developed in terms of data processing.However,due to the large amount of data,the traditional CPU-based computing model has been unable to meet the calculation needs of massive data;on the other hand,GPU due to its structure Naturally suitable for parallel computing has gradually attracted people's attention,and more and more computing frameworks have specifically developed GPU-based versions to improve performance.Although GPUs are inherently suitable for parallel computing,due to the limited data capacity and the high performance overhead of data exchange with memory,and because the GPU itself has few logical control units,these factors cause a separate GPU to only process smaller-scale data and It can only perform calculation tasks that are easy to decompose and cannot perform complex logical judgments like the CPU.In particular,in the execution of graph computing tasks,the synchronization overhead caused by the uneven load of threads within the warp and the discontinuity of the memory access caused by the irregularity of the graph form a bottleneck for GPU performance.In order to more clearly explore the direction of GPU performance optimization and its feasibility,this paper from a basic idea to several transition models,and finally obtained a two-stage load scheduling algorithm based on binary search/interpolation search with better performance improvement.In binary search,each thread finds its own unit of work through the prefix sum array.In the best case,the target value can be found once,and in the worst case,2logN memory access operations are required.In fact,each thread needs to perform two memory access operations in each iteration to access the left and right elements and compare the size to update the value of m.At the same time,because each thread needs to wait for the synchronization of the barrier data when searching concurrently,and at least one thread needs to perform 2logN operations,the complexity of a complete binary search is 2logN.Later,in order to confirm that the algorithm has designed multiple experiments to improve the performance of the GPU,the GPU will test the performance improvement by executing four common graph calculation algorithms.In order to make us have a clearer understanding of the performance of the GPU,we also designed multiple control groups,which will use the CPU and GPU to perform the same algorithm with several other scheduling models.Further,we also used different data sets and multiple iterations and sample capacity to explore the advantages and disadvantages of this scheme for performance optimization.After experiments,the effectiveness of the algorithm is demonstrated based on the average warp usage and execution time of each iteration.And also got the best block size to get the best performance.

Keywords/Search Tags:

GPU, Graph Computing, Coalesced memory access, Binary search, Workload balancing

PDF Full Text Request

Related items

1	Research Of Graph Computation Based On GPU
2	Computing In-Memory Design Based On Double Word Line And Double Threshold 4T SRAM
3	Binary-Tree-Structured Multi-User Oblivious RAM Scheme
4	The Design And Implementation Of The Load Balancing System In The Non-uniform Memory Access Cluster
5	Design And Implementation Of Popularity-aware Redundancy Scheme For In-Memory Stores
6	RDMA-based Distributed Database Memory Storage System
7	Analysis And Visualization Of Memory Access Characteristics In Heterogeneous Memory
8	Application Of Workload Balancing Algorithm In Mobile NetWorks Optimization
9	Research On A Hierarchy-based Workload Dividing Approach For The Distributed DDM
10	Research On Workload-specific Memory Configuration Of Spark Workloads