Font Size: a A A

Research On The Key Techniques Of Programming Model And Compiler Optimization For Many-core GPU

Posted on:2013-07-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:X B GanFull Text:PDF
GTID:1268330392473880Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
GPGPU (General Purpose computing on Graphics Processing Units) has beenwidely applied to high performance computing. However, GPU architecture andprogramming model are different from that of traditional CPU. Accordingly, it is ratherchallenging to develop efficient GPU applications. This thesis focuses on the keytechniques of programming model and compiler optimization for many-core GPU, andaddresses a number of key theoretical and technical issues. The primary contributionsand innovations are concluded as follows.1. We propose a many-threaded programming model. There is no authorizedparallel programming model for multi-core and many-core processors. Accordingly,after understanding stream-based and classical parallel programming models, wepropose a many-threaded programming model ab-Stream, which would transparentizearchitecture differences and provide an easy to parallel, easy to program, easy to extendand easy to tune programming model.2. We propose parallelizing approaches with hierarchy computing granularities tomap GPGPU applications. There are hundreds of computing cores in GPUs. However, itis difficult to identify an appropriate computing granularity to map GPGPU applicationsfor maximizing GPU productivity. Orienting application inputs, firstly, we propose aparallelizing approach with relaxation to parallelize GPU applications characterizedwith chain dependence inputs. Secondly, we propose another pixel-level parallelizingapproach to map GPU applications with2D inputs. Experimental results show thatproposed approaches are easy to implement and would exploit potential parallelism inGPGPU applications efficiently.3. We propose memory optimization and data transfer transformation according todata classification. GPGPU architecture is memory-bound and high-performancearchitecture. In order to effectively utilize diverse GPU storage resources, firstly, wepropose data layout pruning based on classification memory, and then we propose TaT(Transfer after Transformed) for transferring Strided data between CPU and GPU.Experimental results demonstrate that proposed techniques would significantly improveperformance for GPGPU applications.4. We propose a collaborative framework with load-balance for compute-intensiveapplications. Heterogeneous systems composed of CPU and GPU are often not in stateof load-balance. In order to take full advantage of GPU+CPU heterogeneous systems,data transfer and computations would be overlapped in pipeline mode in collaborativeframework proposed. Additionally, optimization techniques including zero-loading andcache loading are integrated into collaborative framework for maximizing performance of heterogeneous systems. Experimental results demonstrate that proposed collaborativeframework would maximize utilization of heterogeneous systems.In order to validate correctness and high productivity of ab-Stream programmingmodel, we design a prototype ab-Stream4G for CUDA-enabled GPU based on proposedtechniques. Experimental results show that ab-Stream4G would work correctly andefficiently.
Keywords/Search Tags:Many-core GPU, Programming model, Computing granularity, Memory optimization, Collaborative framework with load balance, Compileroptimization
PDF Full Text Request
Related items