Font Size: a A A

Architectural Support and Compiler Optimization for Many-Core Architectures

Posted on:2014-02-22Degree:Ph.DType:Dissertation
University:North Carolina State UniversityCandidate:Yang, YiFull Text:PDF
GTID:1458390005492229Subject:Engineering
Abstract/Summary:
Many-core architectures, such as general purpose computation on graphics processing units (GPGPU) and Intel Many Integrated Core (MIC), have been exploited to achieve teraflops computation capability on a single chip. This dissertation proposes both architectural improvement and compiler optimization for many-core architectures.;First, in order to fully utilize the power of GPGPUs, application developers have to consider the platform-specific optimization very carefully. To relieve the workload from application developer, we develop a source to source compiler, which takes a fine-grain GPGPU program as the input and generates an optimized GPGPU program by applying a set of optimization techniques.;Secondly, Intel MIC employs directive-based programming model, aiming at simplifying the program development. However when adapting the legacy programs to Intel MIC, several issues need to be addressed: 1) how to identify the profitable and parallelizable code sections for Intel MIC; 2) how to automatically generate the MIC program; 3) how to minimizing the memory transfer between the CPU and the MIC. We develop one compiler framework, called Apricot, to facilitate the program development by addressing these issues.;Thirdly, shared memory is a software-managed cache of GPGPUs and critical to the performance of GPGPU program. We advocate three software solutions and one hardware solution to mitigate the impact of poor thread level parallelism (TLP) caused by heavy usage of shared memory. While our software approaches work on existing GPGPU hardware, our hardware approach shows significant performance benefit with small hardware cost.;Last, we model the fused heterogeneous architecture by integrating a CPU and a GPU into a single chip with shared last level cache and off-chip memory. Then we advocate using the idle CPU to prefetch data into the last level cache for GPGPU programs. The experimental results show that our proposed technique can greatly improve the GPU programs.
Keywords/Search Tags:GPGPU, MIC, Compiler, Optimization
Related items