Font Size: a A A

Porting And Optimizing GTC-P Code On Sunway TaihuLight Supercomputer

Posted on:2020-05-28Degree:MasterType:Thesis
Country:ChinaCandidate:L J CaiFull Text:PDF
GTID:2428330623463620Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Sunway TaihuLight is China's first supercomputer with theoretical peak performance over 100 PFlops.TaihuLight is equipped with 40,960 SW26010 Chinese home-grown processors,which has 3.06 TFlops theoretical peak performance and 134 GB/s peak memory bandwidth.Compared to most of general purpose processors,the memory performance of SW26010 is extremely limited.To explore the performance of large-scale memory-bound application on TaihuLight,we select GTC-P code as a case study,which is an important scientific application in plasma physics area and has been successfully tested on many world-leading supercomputers.GTC-P code uses Particle-In-Cell algorithm and contains six major compute kernels,where there are lots of intensive memory accesses and irregular memory access operations.We have ported and optimized GTC-P code via Sunway OpenACC and Athread library,and evaluated the performance gap between two versions.We have found that the performance of Open ACC version is impeded by the irregular memory access part in Charge,and this version has a huge performance gap with the peak performance of SW26010 processor.Thus,we have proposed two optimizations for the irregular memory access part in the Athread version,including register level communication and an asynchronization strategy using MPE and CPEs.The Athread version can achieve 2.5X speedup compared with the OpenACC version.After porting and optimizing GTC-P code,we have scaled GTC-P on 4,259,840 cores of TaihuLight,and demonstrated performance comparisons with several world-leading supercomputers.Based on above research works,we have found that:1)porting such memory-bound scientific applications via OpenACC can be easily impeded by the ability of memory access on CPEs;2)designing a data sharing scheme based on register level communication is the key to improving the performance of the irregular memory access part in GTC-P code on TaihuLight.
Keywords/Search Tags:Sunway TaihuLight, GTC-P, OpenACC, Optimization
PDF Full Text Request
Related items