Evaluating And Optimizing Scientific Applications On Many-Core Platforms

Posted on:2015-04-18

Degree:Master

Type:Thesis

Country:China

Candidate:X Gao

Full Text:PDF

GTID:2348330509960883

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

In recent years, with the continuous development of processor technology, using many-core processor as the accelerator for heterogeneous parallel computing is an important trend in the field of high performance computing(HPC). At present, the mainstream many-core products include the general purpose GPU(Graphics Processing Unit) and the recently launched MIC(Many Integrated Core). Although this heterogeneous many-core architecture benefits from its high peak performance, it is very difficult to program on the complex architecture, which brings great challenges for the development, maintenance and migration of an application. As the appearance of cross-platform parallel programming models like Open CL(Open Computing Language) and Open ACC, it gives us a chance to simplify the difficulty of application migration. Code using such models can be executed on various platforms without modification like CPU, GPU, MIC, etc. But how is the performance they obtained, and whether they will be able to fully utilize the hardware resources by appropriate adjustment for the corresponding platform? These have become urgent issues for the HPC community.This thesis is based on the MIC platform and the Open CL programming model, and deeply analyzes the factors that may affect the performance of scientific applications on the many-core platform. For each corresponding issue, a systematical method is proposed to optimize it. The main contributions are listed as follows:(1) With deep evaluation and optimization of the performance of several Open CL scientific applications on the MIC processor, we find out some main factors that can affect the performance in their specific ways. And different forms of each factor can change the performance above 7 times at most. Furthermore, we locate two critical factors, which are the vectorization and memory access pattern.(2) For vectorization, we propose two explicit vectorization methods based on the Open CL vector data type. Then apply these methods on several scientific computing kernels,we discover that the explicit vectorization can improve the bandwidth by changing its data access pattern while improving computing performance. And after using vector data type, the performance improves at most near 16 times. Compared with the compiler's implicit automatic vectorization, there is little difference in performance between the two, but the explicit methods are more flexible and controllable.(3) For memory access, first we analyze the use of Open CL local memory on MIC and find that whether benefit from using local memory depends on the application itself,and it can be seen as a software optimization technique. Then with the analysis of stencil computing Ops/Byte ratio, we propose a parallel temporal spatial hybrid blocking algorithm to improve the cache reuse efficiency, and apply it on the seven point stencil, which improves the performance 1.5 times compared with only using the spatial blocking.

Keywords/Search Tags:

MIC, OpenCL, performance evaluation, vectorization, memory access pattern

PDF Full Text Request

Related items

1	Automatical Memory Access Pattern Analysis Based Open CL Multi-device Shared Memory
2	Study On The Memory Access Pattern Analysis And Application Of Loop
3	Research Of SIMD Vectorization Optimization Based On Memory Access
4	Analytical Modeling And Validation For Access Performance Of DDR Memory System
5	The Performance Evaluation Research Of CFD Application On Intel MIC
6	The Improvement And Research Of DEM Based On OpenCL
7	Performance Model For Parallel Convolutional Neural Network Based On OpenCL
8	Research On Benefit Evaluation Techniques In Automatic Vectorization
9	Study Of Hardware Adptive Prefetch Technoligy Based On Application Pragram Memory Access Pattern
10	Modeling Of Access Performance Evaluation And Optimization Research On Wireless Communication Networks