The Research Of High Performance Algorithm For GROMACS Based On Sunway TaihuLight

Posted on:2021-01-05

Degree:Master

Type:Thesis

Country:China

Candidate:T J Zhang

Full Text:PDF

GTID:2428330602983772

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Sunway TaihuLight is a domestic supercomputer developed by the National Parallel Computing Engineering Technology Research Center.It is currently installed and run at the National Supercomputing Center in Wuxi.It is the first supercomputer in China that is entirely composed of domestic chips and has won first place in the Top500 supercomputer.Its theoretical peak performance reached 120PFlops.At the same time,Sunway TaihuLight is also the first supercomputer in China to have won the Gordon Bell Award,the highest award for supercomputer applications.The emergence of Sunway TaihuLight is of great significance to China's supercomputers and applications on supercomputersGROMACS is a very commonly used molecular dynamics application.Many proteins and macromolecules can be simulated by it.Because the simulation speed of GROMACS is very fast,it is very useful in the fields of macromolecular materials,pharmaceuticals,and biochemistry.It has a very strong demand for computing resources.So it is very important and necessary to port GROMACS on Sunway TaihuLight and make full use of itSunway TaihuLight is mainly composed of sw26010,which is a typical many-core heterogeneous architecture chip.It has four core groups in one chip.Each computer group has one master processor element(MPE)and 64 computer processer elements(CPEs),and each CPE has a 64K Cache.For molecular dynamics software GROMACS,the low bandwidth from CPE to main memory will seriously affect the performance of our program.Therefore,it is very urgent but very challenging to make GROMACS perform better in the light of Sunway TaihuLightIn our work,we use or come up with different strategies to improve performance 1.Data reconstruction strategy.2.Implement a software Cache from the CPE.3.Implement a delayed update strategy.4.Vectorized short-range force calculation part 5.Use backup arrays to resolve read and write conflicts.6.Use the tag update strategy to improve performance.7.Optimize the table building process to make the overall performance better.Through these different measures and means.We mainly reduce the pressure on bandwidth by increasing the bandwidth between the slave core and the main memory and improving the utilization of the interaction between the slave core and the main memoryAfter using our different strategies,we obviously feel the improvement of program performance.In the core computing part,we have achieved a speedup of 60 times.From the overall performance point of view,we have obtained a speedup of 30 times.Overall performance is six to seven times faster than the best current implementation.And our strategy has a better implementation than other strategies.Our strategy can also be applied to other platforms or applications.

Keywords/Search Tags:

Sunway TaihuLight, High Performance Computing, Massive Parallelization, Molecular Dynamic, GROMACS, Bandwidth, Optimization

PDF Full Text Request

Related items

1	Optimization Of Molecular Dynamics Algorithms Based On The Sunway TaihuLight Supercomputer
2	Parallel Implementation And Performance Optimization For Refactoring GROMACS On The Sunway Many-core Architecture
3	The Design And Optimization Of High-performance Molecular Dynamics Algorithms On The Sunway TaihuLight Supercomputer
4	Implementing Molecular Dynamics Simulation On The Sunway TaihuLight System With Heterogeneous Many-Core Processors
5	Design And Implementation Of Heterogeneous Parallel Algorithms On The Sunway Taihulight
6	Parallel Deep Learning Training System On Sunway TaihuLight
7	Porting And Optimizing GTC-P Code On Sunway TaihuLight Supercomputer
8	Optimization Of Molecular Dynamics Simulaiton Algorithm On Sunway Supercomputers
9	Implementation And Optimization Of Molecular Dynamics Application On Sunway Taihulight Supercomputer
10	The Optimization Of The Tend_lin Application Task Graph Parallel On Sun Way TaihuLight Supercomputer