Font Size: a A A

Research And Implementation On Compiler Framework For Translating Ansic C Into CUDA C

Posted on:2012-08-11Degree:MasterType:Thesis
Country:ChinaCandidate:Q ZhuFull Text:PDF
GTID:2218330362960097Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Recently, GPU (Graphics Processing Unit) has been widely used in high-performance computing applications such as biomedical, financial analysis, physical simulation and database processing due to powerful computing capability of GPU. GPGPU (General-purpose computation on GPU) is proposed when GPU is used in the other areas except graphics rendering.It is a chanllenge to make best use of computing resources due to GPU complexity CUDA (Compute Unified Device Architecture) introduced by NVIDIA Corporation has provided an efficient solution for managing many-core processors such as GPUs. Compared with previous programming models, there are two improvements in CUDA. One is the introduction of a unified processing architecture, and the other is the employment of on-chip shared memory. These improvements make GPU more suitable for general purpose computing with the help of CUDA. However, programmers should be familiar with underlying architecture of GPU in order to develop high-performance applications because of the muti-level thread structure and memory hierarchy in CUDA-enabled GPU.We propose a source-to-source compiler framework to reduce the burden for GPU programmers. The framework is not only able to automatically generate applications which can excuse on heterogeneous system composed of CPU and GPU, but also can perform optimization to improve parallelism of the applications. The innovative work in thes thesis can be summarized as follows.1. A compiler framework for translating ansic C into CUDA code is proposed. ICuda is able to release the programmer from the details of the structure of the GPU and CUDA, and to improve the efficiency of developing high-performance parallel programs. Compared with most of the existing frameworks which are only adapted to the determinant or matrix-based applications, ICuda is able to deal with general applications.2. A scheduling approach of paralleling loop structure is proposed. The structure of data and loop should be transformed in order to adapt to the multi-core architecture of programming model when the serial code is parallelized. Accordingly, we propose the scheduling approach of subscript transformation and distributing the access to shared variables. Before the parallelization, nested sequential loops are transformed to single sequential loop so as to simplify the index of data. During the parallelization, the access to the shared variable is distributed to the different copies in order to reduce the overhead of the memory access.3. We propose a novel optimization for memory access based on CUDA. The single most important performance consideration in programming for CUDA architecture is coalescing global memory accesses. However, it is difficult to transform non-coalescing accesses into coalescing mode. Alternatively, we propose a novel and efficient strategy to achieve memory optimization by binding read-only data into text memory space.We implement the ICuda framework based on SUIF2 system. We adopt Parboil as the benchmark and propose a detail scheme of performance testing and analysis. Experimental results demonstrate the validity and efficiency of the optimization method and ICuda framework presented in this thesis.
Keywords/Search Tags:GPGPU, CUDA, parallelization, compilation optimization
PDF Full Text Request
Related items