Architectural Support and Compiler Optimization for Many-Core Architectures

Posted on:2014-02-22

Degree:Ph.D

Type:Dissertation

University:North Carolina State University

Candidate:Yang, Yi

Full Text:PDF

GTID:1458390005492229

Subject:Engineering

Abstract/Summary:

Many-core architectures, such as general purpose computation on graphics processing units (GPGPU) and Intel Many Integrated Core (MIC), have been exploited to achieve teraflops computation capability on a single chip. This dissertation proposes both architectural improvement and compiler optimization for many-core architectures.;First, in order to fully utilize the power of GPGPUs, application developers have to consider the platform-specific optimization very carefully. To relieve the workload from application developer, we develop a source to source compiler, which takes a fine-grain GPGPU program as the input and generates an optimized GPGPU program by applying a set of optimization techniques.;Secondly, Intel MIC employs directive-based programming model, aiming at simplifying the program development. However when adapting the legacy programs to Intel MIC, several issues need to be addressed: 1) how to identify the profitable and parallelizable code sections for Intel MIC; 2) how to automatically generate the MIC program; 3) how to minimizing the memory transfer between the CPU and the MIC. We develop one compiler framework, called Apricot, to facilitate the program development by addressing these issues.;Thirdly, shared memory is a software-managed cache of GPGPUs and critical to the performance of GPGPU program. We advocate three software solutions and one hardware solution to mitigate the impact of poor thread level parallelism (TLP) caused by heavy usage of shared memory. While our software approaches work on existing GPGPU hardware, our hardware approach shows significant performance benefit with small hardware cost.;Last, we model the fused heterogeneous architecture by integrating a CPU and a GPU into a single chip with shared last level cache and off-chip memory. Then we advocate using the idle CPU to prefetch data into the last level cache for GPGPU programs. The experimental results show that our proposed technique can greatly improve the GPU programs.

Keywords/Search Tags:

GPGPU, MIC, Compiler, Optimization

Related items

1	Research And Implementation On Compiler Framework For Translating Ansic C Into CUDA C
2	Research On Compiler Optimization Technologies For THUMP
3	A Quick And Generic Approach Of Selecting Compiler Optimization Options
4	Resarch On Cross-Plateform Compiler Analysis And Optimization Technique Based On Peak Architecture
5	Data Sharing Optimization On CPU-GPGPU Shared Last Level Cache System
6	Research Of Automatic Compiler Tuning Base On Machine Learning
7	Automatically constructing compiler optimization heuristics using supervised learning
8	Research On Low Power Compiler Optimization Algorithm And Software Power Analysis Technology
9	Research And Implementation Of Program Compiler Security Option Technology In Linux System
10	Eliminating scope and selection restrictions in compiler optimization