Font Size: a A A

The Research Of High Performance Algorithm For GROMACS Based On Sunway TaihuLight

Posted on:2021-01-05Degree:MasterType:Thesis
Country:ChinaCandidate:T J ZhangFull Text:PDF
GTID:2428330602983772Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Sunway TaihuLight is a domestic supercomputer developed by the National Parallel Computing Engineering Technology Research Center.It is currently installed and run at the National Supercomputing Center in Wuxi.It is the first supercomputer in China that is entirely composed of domestic chips and has won first place in the Top500 supercomputer.Its theoretical peak performance reached 120PFlops.At the same time,Sunway TaihuLight is also the first supercomputer in China to have won the Gordon Bell Award,the highest award for supercomputer applications.The emergence of Sunway TaihuLight is of great significance to China's supercomputers and applications on supercomputersGROMACS is a very commonly used molecular dynamics application.Many proteins and macromolecules can be simulated by it.Because the simulation speed of GROMACS is very fast,it is very useful in the fields of macromolecular materials,pharmaceuticals,and biochemistry.It has a very strong demand for computing resources.So it is very important and necessary to port GROMACS on Sunway TaihuLight and make full use of itSunway TaihuLight is mainly composed of sw26010,which is a typical many-core heterogeneous architecture chip.It has four core groups in one chip.Each computer group has one master processor element(MPE)and 64 computer processer elements(CPEs),and each CPE has a 64K Cache.For molecular dynamics software GROMACS,the low bandwidth from CPE to main memory will seriously affect the performance of our program.Therefore,it is very urgent but very challenging to make GROMACS perform better in the light of Sunway TaihuLightIn our work,we use or come up with different strategies to improve performance 1.Data reconstruction strategy.2.Implement a software Cache from the CPE.3.Implement a delayed update strategy.4.Vectorized short-range force calculation part 5.Use backup arrays to resolve read and write conflicts.6.Use the tag update strategy to improve performance.7.Optimize the table building process to make the overall performance better.Through these different measures and means.We mainly reduce the pressure on bandwidth by increasing the bandwidth between the slave core and the main memory and improving the utilization of the interaction between the slave core and the main memoryAfter using our different strategies,we obviously feel the improvement of program performance.In the core computing part,we have achieved a speedup of 60 times.From the overall performance point of view,we have obtained a speedup of 30 times.Overall performance is six to seven times faster than the best current implementation.And our strategy has a better implementation than other strategies.Our strategy can also be applied to other platforms or applications.
Keywords/Search Tags:Sunway TaihuLight, High Performance Computing, Massive Parallelization, Molecular Dynamic, GROMACS, Bandwidth, Optimization
PDF Full Text Request
Related items