A Compiler For Automatic Translating OpenACC Program To Intel Multicore And Manycore Platform

Posted on:2016-07-08

Degree:Master

Type:Thesis

Country:China

Candidate:X Jiang

Full Text:PDF

GTID:2298330470957726

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

With different characteristics of underlying architecture, CPU-Accelerator heterogeneous system can handle general purpose computation tasks more efficiently than homogeneous system. NVIDIA GPU, AMD GPU and Intel Xeon Phi coprocessor are typical accelerators, and their corresponding programming models are CUDA, OpenCL and Xeon Phi directive. However, these native programming models have some technical limitations. First, native programming models like CUDA and OpenCL have programming complexity and difficulty of optimization. Second, with big difference of calculation model between accelerator and CPU, transplanting old programs into accelerators is an extremely difficult task for programmers. However, completely. Third, If programs are written in specific hardware platform language, software must upgrade when hardware upgrade. However, frequent software upgrade will bring huge burden to users. Fourth, with accelerators having their own programming languages, it’s difficult for investors to construct hardware platform and choose development language. OpenACC standard can overcome the above four limitations by adding compiler directives to identify which areas of code or loop to accelerator.This thesis implements a source to source translation tool OpenACC_JX which translates from C source code with OpenACC directives to optimized Intel offload code automatically. Thus we can take advantage of OpenACC to program on Intel Xeon Phi coprocessor and the MIC programming efficiency is improved greatly in parallel computation. The major achievements cover the following aspects:(1) We design a source to source translation tool. The tool uses the LLVM compiler infrastructure and mainly their native C/C++compiler Clang. We expand Clang’s preprocessor, parser and semantic analyzer to identify OpenACC directives and use rewritten mechanism to translate code. This source to source transformation tool is based on Lib Tooling. The source to source translation tool makes Intel Xeon Phi support OpenACC and improves programming efficiency for Intel Xeon Phi(2) Mapping OpenACC directives into Offload directives. The mapping relationship has three parts:task, data and parallelism management.(3) We implement two kinds of optimization techniques to ensure good performance of the translated code. They are data communication optimization and vectorization optimization. Communication between processors is an important source of time overhead for many applications when parallel programs run. Reducing the cost brought by data communication operation is important to improve the overall performance, so we employ data communication optimization. In order to folly use the MIC vector processor unit to improve computing speed, vectorization optimization is employed.The thesis adopts NPB as the benchmark and experiments show that our translation tool achieve74%,76%,80%average performance of hand-written version with three problem size of Class A, Class B, Class C.

Keywords/Search Tags:

OpenACC, XeonPhi, muti core/many core, source to source translation, parallel programming, parallel optimization

PDF Full Text Request

Related items

1	Performance Analysis And Optimization Of Current Parallel Programming Models For Many-core Systems
2	Research Of Multi-core CPU And Many-core GPU Accelerated Parallel Optimization Algorithms
3	Application Of Multi-core Parallel Programming Technology To Accelerate Digital Image Processing
4	Research On Directive-based Parallel Language For Sunway Taihulight Supercomputer And Design Of The Compiling Optimization
5	Data Optimization In Parallel Compilation For Heterogeneous Multi-core Processor
6	Parallel Programming And Optimization Based On Multi-core Processors
7	The Research Of Parallel Optimization Of The Multi-core Numerical Algorithm
8	Research And Implementation Of Data Parallel Programming Platform Based On Multi-Core
9	Research Of Multi-core Program Optimization Based On Task Parallel Strategies
10	Download System Research And Development Based On The Parallel Multi-core Environment