Font Size: a A A

Implementation And Optimization Of HPCG On Multi-core And Many-core Platform

Posted on:2019-11-28Degree:MasterType:Thesis
Country:ChinaCandidate:C Z LiaoFull Text:PDF
GTID:2428330545977031Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
The Sunway TaihuLight is the newest supercomputer developed by our own R&D and is the first system with the peak performance greater than 100 pete flops in the world.The Sunway TaihuLight system consist of 40960 SW26010 heterogeneous many-core processors with a total of 10.4 million computing cores.And each Sunway processor is composed with four Core Groups(CGs),which has 65 computing cores,one Manage-ment Processing Element(MPE)and 64(8 x 8)Computing Processing Elements(CPEs)are included.The peak performance is more than 3 TFlops on one chip and 125 PFlops on the whole system which is reach to 70%of the peak performance,but the rato of performance of High Performance Conjugate Gradients(HPCG)and LINPCK is 0.4%which show it is not only necessary to perform in-depth optimization for the application,but it may also need to propose certain improvements on architecture of the system for application like HPCG.As the newest benchmark for HPC,HPCG was proposed to test the real performance from various angles by solving linear equations,including comput-ing,irregular memory access,different communication mode,etc,which is more better to present real applications.The goal of this paper is to achieve the parallel implementa-tion of HPCG on multi-core and many-core based on algorithm and architecture to gain an in-depth understanding of adaptability to different architectures and provide some suggestions on refactoring and optimizing for other HPC applications and the develop-ment of the next generations for Sunway TaihuLight System.Research and innovation of this paper includes the following three aspects:First,Implementation and optimizaton HPCG on Xeon multi-core and many-core processor.On the one hand,we thorough analysis the software structure of HPCG and profiling the runtime characteristics such as computing,memory or other aspects by various performance analysis tools,and finding out the performance bottlenecks and identifying the data dependency of the problem.On the other hand,we apply sev-eral different strategies to parallel the HPCG base on the multi-core/many-core proces-sor(including Xeon CPU,GPU P100 and KNL)and optimize its performance on time locality and space locality.Fully understand the characteristics of imulti-core processor and performance of HPCG on such architecture to providing prior knowledge for the implementation and optimizing on Sunway or other heterogeneous platforms.Secondly,Implementation and optimizaton HPCG on Sunway processor.Based on HPCG software runtime and the architecture of Sunway processor,we propose four dif-ferent methods to parallel HPCG on one Core Groups,including Multi-Coloring(MC),Level-Scheduling(LS),0-1 and Hierarchical Grid Collaborative(HGC)which is more in line with the Sunway procssor,besides,we apply a serial of optimization scheme on single node and multi-nodes.On one CGs,HPCG is optimized on various aspects,for example:data transfere scheme,working mode of MPE and CPEs,synchronization method and so on.On multi-nodes,we change the transmission mode of data between neighbor nodes and design a software cache schema to fetch boundary data.Experiment has shown that after optimization,the performance of the four stratregies on one CGs achieved speed up about 1.54x,5.52x,10.9x and 15.6x,respectively and gained 192 TFlops when scaling to 81920 MPI processes(5,324,800 cores)with 70%node-level parallelization efficiency.Finally,Comparision implemention and optimization strategy of HPCG on multi-core and many-core processor.Combing the HPCG performance on Xeon multi-core,Sunway many-core processor,GPU P100 and KNL,we analysis the architecture and performance differences on these platforms based on the application similar to HPCG,namely which architecture is better for this kind of applications.We discuss the relatio-ship among the algorithm,architecture and performance from parallel methods,memory bandwidth,SIMD and other aspects.Finally,we put forward relevant suggestions for the development of next generation' s Sunway TaihuLight system from various aspects.
Keywords/Search Tags:The Sunway TaihuLight Supercomputer, HPCG, Heterogeneous many-core processor, parallel, Implementation, Optimization, Otherness
PDF Full Text Request
Related items