Font Size: a A A

Evaluating And Optimizing Scientific Applications On Many-Core Platforms

Posted on:2015-04-18Degree:MasterType:Thesis
Country:ChinaCandidate:X GaoFull Text:PDF
GTID:2348330509960883Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years, with the continuous development of processor technology, using many-core processor as the accelerator for heterogeneous parallel computing is an important trend in the field of high performance computing(HPC). At present, the mainstream many-core products include the general purpose GPU(Graphics Processing Unit) and the recently launched MIC(Many Integrated Core). Although this heterogeneous many-core architecture benefits from its high peak performance, it is very difficult to program on the complex architecture, which brings great challenges for the development, maintenance and migration of an application. As the appearance of cross-platform parallel programming models like Open CL(Open Computing Language) and Open ACC, it gives us a chance to simplify the difficulty of application migration. Code using such models can be executed on various platforms without modification like CPU, GPU, MIC, etc. But how is the performance they obtained, and whether they will be able to fully utilize the hardware resources by appropriate adjustment for the corresponding platform? These have become urgent issues for the HPC community.This thesis is based on the MIC platform and the Open CL programming model, and deeply analyzes the factors that may affect the performance of scientific applications on the many-core platform. For each corresponding issue, a systematical method is proposed to optimize it. The main contributions are listed as follows:(1) With deep evaluation and optimization of the performance of several Open CL scientific applications on the MIC processor, we find out some main factors that can affect the performance in their specific ways. And different forms of each factor can change the performance above 7 times at most. Furthermore, we locate two critical factors, which are the vectorization and memory access pattern.(2) For vectorization, we propose two explicit vectorization methods based on the Open CL vector data type. Then apply these methods on several scientific computing kernels,we discover that the explicit vectorization can improve the bandwidth by changing its data access pattern while improving computing performance. And after using vector data type, the performance improves at most near 16 times. Compared with the compiler's implicit automatic vectorization, there is little difference in performance between the two, but the explicit methods are more flexible and controllable.(3) For memory access, first we analyze the use of Open CL local memory on MIC and find that whether benefit from using local memory depends on the application itself,and it can be seen as a software optimization technique. Then with the analysis of stencil computing Ops/Byte ratio, we propose a parallel temporal spatial hybrid blocking algorithm to improve the cache reuse efficiency, and apply it on the seven point stencil, which improves the performance 1.5 times compared with only using the spatial blocking.
Keywords/Search Tags:MIC, OpenCL, performance evaluation, vectorization, memory access pattern
PDF Full Text Request
Related items