Font Size: a A A

Research On Key Technologies Of GPU Architecture Optimization For Graph Computing

Posted on:2019-06-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:H GuoFull Text:PDF
GTID:1360330611493110Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Large scale graph processing has been one of the critical components in many data analysis applications.As one of the most basic abstract data structure,graphs are used to represent the relationships between objects and also they are used in various applications as the way of data representation,such as web page ranking,social networks,tracking the effects of drugs on the cells,genetics,infectious diseases spreading,etc.GPUs have become a mainstream massive parallel computing platform and they can achieve higher performance with less energy cost than multiprocessors.Many graph computing programming models try to use GPU to accelerate graph computing algorithms.To solve the load imbalance and control divergence problems when accelerating graph computing algorithms by GPUs,we analyse the instruction execution and memory access of graph computing algorithms on GPU.From the GPU architecture point of view,we propose the hardware support for GPU to effectively solve the problem of load imbalance,low utilization of data cache and high data access latency and improve the performance of graph computing on GPUs.Our works are listed as follows:1.We propose a dynamic multi-grain cache management mechanism for GPU.There exist many problems,such as memory divergence,fine-grain data access,low utilization of on-chip data cache,etc.,when using GPU to accelerate graph computing algorithms,which makes GPU unable to release its strong computing power.We propose the dynamic multi-grain cache management mechanism,targeting the contradiction between the small size of the requested data and the coarse grain cache management,and design and implement the hardware cache management unit on the simulator.The cache management mechanism solves the problem of low utilization of L1 data cache and effectively improves the throughput of applications with irregular memory accesses on GPU.The experiment results show that compared with the current L1 data cache on GPU and other fine-grain cache management mechanisms,the proposed cache management mechanism improves the space utilization of L1 data cache and the performance of applications effectively.2.We propose a data structure-aware prefetching technique.Although data prefetching techniques for fixed data access pattern work well,there have not had any good data prefetching techniques for irregular data accesses.We analyse the data structure access pattern of breadth-first search,propose the technique of data-structure aware prefetching,and design and implement hardware data prefetching unit and programming interface on the simulator.This technique uses the explicit graph data structure access information to improve the accuracy of data prefetching,reduces the overhead of data prefetching on memory bandwidth,and reduces the latency of data accesses effectively.The experiment results show compared with existing data prefetching techniques,the proposed data prefetching technique is able to improve the accuracy of data prefetching for irregular data accesses,reduce the latency of memory access,and improve the performance of GPU dramatically.3.We propose a novel high-effective GPU architecture for graph computing.Although many GPU-based graph computing programming models are proposed,the load balance problem of graph computing is not able to be solved thoroughly.We analyse the software graph computing programming models,propose the novel high-effective GPU architecture for graph computing,and design and implement the hardware architecture and programming model on the simulator.The novel architecture eliminates the overheads of the load balancing pre-computations,and implements high effective load balance of inner GPU core and inter GPU core.The experiment results show compared with software-implemented graph computing programming models,the proposed GPU architecture can reduce the overheads of load balancing effectively,and dramatically improve the throughput of GPU.
Keywords/Search Tags:GPU, Large scale graph computing, Data cache management, Data prefetching, Load balance
PDF Full Text Request
Related items