Font Size: a A A

A Compiler For Automatic Translating OpenACC Program To Intel Multicore And Manycore Platform

Posted on:2016-07-08Degree:MasterType:Thesis
Country:ChinaCandidate:X JiangFull Text:PDF
GTID:2298330470957726Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With different characteristics of underlying architecture, CPU-Accelerator heterogeneous system can handle general purpose computation tasks more efficiently than homogeneous system. NVIDIA GPU, AMD GPU and Intel Xeon Phi coprocessor are typical accelerators, and their corresponding programming models are CUDA, OpenCL and Xeon Phi directive. However, these native programming models have some technical limitations. First, native programming models like CUDA and OpenCL have programming complexity and difficulty of optimization. Second, with big difference of calculation model between accelerator and CPU, transplanting old programs into accelerators is an extremely difficult task for programmers. However, completely. Third, If programs are written in specific hardware platform language, software must upgrade when hardware upgrade. However, frequent software upgrade will bring huge burden to users. Fourth, with accelerators having their own programming languages, it’s difficult for investors to construct hardware platform and choose development language. OpenACC standard can overcome the above four limitations by adding compiler directives to identify which areas of code or loop to accelerator.This thesis implements a source to source translation tool OpenACC_JX which translates from C source code with OpenACC directives to optimized Intel offload code automatically. Thus we can take advantage of OpenACC to program on Intel Xeon Phi coprocessor and the MIC programming efficiency is improved greatly in parallel computation. The major achievements cover the following aspects:(1) We design a source to source translation tool. The tool uses the LLVM compiler infrastructure and mainly their native C/C++compiler Clang. We expand Clang’s preprocessor, parser and semantic analyzer to identify OpenACC directives and use rewritten mechanism to translate code. This source to source transformation tool is based on Lib Tooling. The source to source translation tool makes Intel Xeon Phi support OpenACC and improves programming efficiency for Intel Xeon Phi(2) Mapping OpenACC directives into Offload directives. The mapping relationship has three parts:task, data and parallelism management.(3) We implement two kinds of optimization techniques to ensure good performance of the translated code. They are data communication optimization and vectorization optimization. Communication between processors is an important source of time overhead for many applications when parallel programs run. Reducing the cost brought by data communication operation is important to improve the overall performance, so we employ data communication optimization. In order to folly use the MIC vector processor unit to improve computing speed, vectorization optimization is employed.The thesis adopts NPB as the benchmark and experiments show that our translation tool achieve74%,76%,80%average performance of hand-written version with three problem size of Class A, Class B, Class C.
Keywords/Search Tags:OpenACC, XeonPhi, muti core/many core, source to source translation, parallel programming, parallel optimization
PDF Full Text Request
Related items