Research On The Key Techniques Of Programming Model And Compiler Optimization For Many-core GPU

Posted on:2013-07-12

Degree:Doctor

Type:Dissertation

Country:China

Candidate:X B Gan

Full Text:PDF

GTID:1268330392473880

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

GPGPU (General Purpose computing on Graphics Processing Units) has beenwidely applied to high performance computing. However, GPU architecture andprogramming model are different from that of traditional CPU. Accordingly, it is ratherchallenging to develop efficient GPU applications. This thesis focuses on the keytechniques of programming model and compiler optimization for many-core GPU, andaddresses a number of key theoretical and technical issues. The primary contributionsand innovations are concluded as follows.1. We propose a many-threaded programming model. There is no authorizedparallel programming model for multi-core and many-core processors. Accordingly,after understanding stream-based and classical parallel programming models, wepropose a many-threaded programming model ab-Stream, which would transparentizearchitecture differences and provide an easy to parallel, easy to program, easy to extendand easy to tune programming model.2. We propose parallelizing approaches with hierarchy computing granularities tomap GPGPU applications. There are hundreds of computing cores in GPUs. However, itis difficult to identify an appropriate computing granularity to map GPGPU applicationsfor maximizing GPU productivity. Orienting application inputs, firstly, we propose aparallelizing approach with relaxation to parallelize GPU applications characterizedwith chain dependence inputs. Secondly, we propose another pixel-level parallelizingapproach to map GPU applications with2D inputs. Experimental results show thatproposed approaches are easy to implement and would exploit potential parallelism inGPGPU applications efficiently.3. We propose memory optimization and data transfer transformation according todata classification. GPGPU architecture is memory-bound and high-performancearchitecture. In order to effectively utilize diverse GPU storage resources, firstly, wepropose data layout pruning based on classification memory, and then we propose TaT(Transfer after Transformed) for transferring Strided data between CPU and GPU.Experimental results demonstrate that proposed techniques would significantly improveperformance for GPGPU applications.4. We propose a collaborative framework with load-balance for compute-intensiveapplications. Heterogeneous systems composed of CPU and GPU are often not in stateof load-balance. In order to take full advantage of GPU+CPU heterogeneous systems,data transfer and computations would be overlapped in pipeline mode in collaborativeframework proposed. Additionally, optimization techniques including zero-loading andcache loading are integrated into collaborative framework for maximizing performance of heterogeneous systems. Experimental results demonstrate that proposed collaborativeframework would maximize utilization of heterogeneous systems.In order to validate correctness and high productivity of ab-Stream programmingmodel, we design a prototype ab-Stream4G for CUDA-enabled GPU based on proposedtechniques. Experimental results show that ab-Stream4G would work correctly andefficiently.

Keywords/Search Tags:

Many-core GPU, Programming model, Computing granularity, Memory optimization, Collaborative framework with load balance, Compileroptimization

PDF Full Text Request

Related items

1	A Novel Cyberinfrastructure For Scientific Computation Based On The P2P Grid
2	Coarse-grained Distributed Computing Model Of Particle Swarm Optimization And Its Implementation In Spark Framework
3	A Parallel Programming Model For Shared Persistent Memory On Node.js
4	Design And Implement Of A Light Distributed Computing Engine Based On Memory
5	Study On Parallel Programming Models
6	Parallel Computing Software Transactional Memory Programming Algorithm Synthesis Optimization
7	Research On Significant Technologies Of Performance Optimization On In-memory Computing Framework
8	Research Of Unbalanced Tree Search Based On Dynamic Granularity Strategy
9	Memory Optimization On Chip Multi-core Processors
10	Research On Heterogeneous And Multi-core Graph Computing Systems