Study Of Porting And Optimization Of GTC-P On Large Scale System Using OpenACC

Posted on:2019-02-15

Degree:Master

Type:Thesis

Country:China

Candidate:Y M Wei

Full Text:PDF

GTID:2428330590467389

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

In recent years,with the fast development of accelerator,such as GPU,accelerator-based heterogeneous computing has risen in popularity in high performance computing.With the increasing complexity of the cluster architectures,running applications on different architec-tures aften requires different versions of code,which poses a great challenge to the developer.OpenACC is a directive-based parallel programming model,which provides performance on and portability across a wide variety of platforms,including GPU,multicore CPU,and many-core processors.GTC-P is a discovery-science-capable real-world application code based on the Particle-In-Cell?PIC?algorithm that is well-established in the HPC area.Basic versions of this code have demonstrated performance portability on TOP500 supercomputers with different architectures,including Titan,Mira,etc^[1].Besides,it is included in the US Department of En-ergy's NERSC National Supercomputer Center benchmark test set^[2].We use OpenACC port and optimize GTC-P based on the OpenMP version of code and evaluate its performance porta-bility on multi platforms across large scale system.With further optimization including data locality optimization,thread mapping optimization and insert CUDA code,we achieve 4.2�speedup compared with the OpenMP code on single node.OpenACC achieves over 90%per-formance of CUDA version with only about 300LOC.We perform scaling evaluation on Titan with up to 4096 nodes and analyze its performance with CUDA version code.The evaluation result shows that OpenACC still achieve comparable scalability with CUDA on such large scale system.The main contribution of this study are as follows:First,we implement and optimize the first OpenACC version of GTC-P.After further optimization,including data locality,thread mapping and CUDA optimization,OpenACC version achieve 4.2�speedup.We notice that atomic operation has great impact on the performance.We propose two different optimization methods to reduce atomic influence on x86 multicore and GPU.Second,as far as we know,this is the first time use OpenACC to port and evaluate application on such large scale system.We adjust the algorithm to reduce the GPU memory usage by redundant computing,which enable us to simulate larger test case.We scale the OpenACC code on up to 4096 nodes on Titan.OpenACC is shown to be able to deliver impressive productivity and performance with respect to portability and scalability.

Keywords/Search Tags:

high performance computing, OpenACC, PIC, parallel computing, CUDA

PDF Full Text Request

Related items

1	Gpu Based On Particle Simulation High-performance Computing Systems
2	Study On High Performance Computing Method In Phylogenetic Tree Likelihood Estimation Of Protein
3	Research On General Purpose GPU Computing Technology In The High Performance Computing Platform
4	Design And Application Of High Performance Computing Platform
5	Design And Implementation Of Parallel SM4-GCM Based On CUDA
6	Modeling Of High Performance Computing On Many-core Processors
7	Implementation Of Two-dimensional DFT Parallel Algorithm On CUDA
8	Research Of Finite Element Method On GPU
9	Research Of High Performance Evolutionary Algorithm Based On Distributed Parallel Computing
10	Simulation Runner: A Lightweight Cloud-based HPC Platform