Font Size: a A A

Optimization Of Molecular Dynamics Algorithms Based On The Sunway TaihuLight Supercomputer

Posted on:2021-02-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:X H DuanFull Text:PDF
GTID:1368330602482461Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Chemistry,biology,material science,and medical science are under continuous development,which brings high demand for the study of the micro-scoped world.But doing inspections at a molecule or atomic level is still expensive,and it is even more difficult to achieve continuous inspection in micro-scoped procedures.Also,some micro-scoped procedures take place in a short period,making it difficult to capture them in laboratories.So,using computer simulation to acquire knowledge of the micro-scoped world is very important in studying these fields.First principle simulation,Monte Carlo simulation and molecular dynamics simulation are major methods in the computer simulation of the micro-scoped world.The first principle simulation is mainly based on the density functional theory and can achieve precise simulations of electronic orbitals.But it has a high consumption of computing resources,making it is mostly used in simulations of small systems with tens or hundreds of atoms with a short period.Monte Carlo simulation is based on the minimization method and has a much lower computation workload.It can deduce the structure of molecules,but it is not suitable in the simulation of dynamic procedures.Molecular dynamics simulation is based on Newton's dynamics Laws and uses some empirical parameters for analyzing the interactions among particles.Because it is both good in efficiency and precision,it becomes a mostly used method in the simulation of micro-scoped systems.The growing application of molecular dynamics simulations in scientific domains brings an increasing demand for the efficiency of molecular dynamics simulations,and the time scale and space scale of molecular dynamics simulations are constantly increasing.To fulfill such demands,molecular dynamics applications are making use of newer technologies and devices to achieve better efficiency:GROMACS can make good use of SIMD instruction sets from SSE2 to AVX512;LAMMPS has corresponding acceleration packages for SIMD,CUDA,OpenCL and KOKKOS programming model,so that it can make use of various computation devices;AMBER has intensive optimization on NVIDIA's GPU platform,making it,an outstanding work in simulating biological systems on GPUs.Due to the demand for higher efficiency of molecular dynamics simulation,ASIC-based architectures like Anton and MDGRAPE series have emerged.With customized computing pipelines and networks,these architectures can greatly reduce the latency of a single iteration,and realize simulations with time scale at milliseconds.As for space scale,heterogeneous processors' increasing computation capability and memory size make simulations of millions of atoms on a single computer usual,and supercomputers can achieve simulations of billions or trillions of atoms.The Sunway TaihuLight supercomputer is released in the year 2016.It consists of 40 960 SW26010 processors and achieves a Rmax value of 125 PFlops/s.This is the first supercomputer that becomes the top one in the Top500 list with China's homegrown processor.Due to radical architectural changes on its SW26010 processor,it achieves both high computation power and power-to-energy ratio.These architectural changes also bring changes in the memory model and programming model,and this makes existing scientific computation software hard to make use of the computation power of the TaihuLight supercomputer.Although SW26010's master-slave heterogeneous computation model has been realized on platforms with GPUs or MICs,SW26010 nodes do not have high-speed intermediate memory like GDDR or HBM.The SW26010 processor comes with direct data transfer between shared memory and the on-chip memory of computation cores,this requires us to redesign the memory access pattern.Besides,the on-chip memory of computation cores is not ordinal cache,but a software-controlled 64KiB LDM instead.This also brings opportunities and challenges to software development on SW26010 processor:On the one hand,people need to make huge changes to the software to make use of the computation cores,on the other hand,people can control data in/out in a finer grain,and achieve better use of the LDM.The major difficulties of performing molecular dynamics simulations on the Sunway TaihuLight supercomputer are:1)Molecular dynamics applications are memory-bounded applications with a lot of random memory access,but SW26010 processor has poor memory bandwidth and its DMA instructions require a large memory accessing block.2)Existing multi-process frameworks in molecular dynamics applications are not suitable for CPEs.Thus,we must create a multi-threading framework.3)Molecular dynamics applications have random memory updates,and this will cause write conflicts in a multi-threading environment.However,there is not much experience in solving this problem on the SW26010 processor.4)The computing part in molecular dynamics applications is com-plexed.For example,solving valance angles and bond orders require many cal-culations of transcendental functions.For solving the problems above,this thesis carries out several schemes for the optimization of potential function computation and neighborhood index construction.Also,this thesis provides some design and implementation of general-purpose modules on the Sunway platform.These modules not only support the research of molecular dynamics applications on the TaihuLight but also provide convenience for other works on the TaihuLight supercomputer.The major contribution of this thesis is as follows:1.The design of a simple and efficient software cache,as well as a neighbor-list-free method for reading data for potential function computation:This thesis makes trade-offs in the design of software cache according to AMAT theory and provides an efficient software implementation.Also,the neighbor-list-free method can match SW26010's hardware architecture very well.Both methods can make good use of the memory of the SW26010 processor.2.The design of one-sided memory update,hybrid memory update and self-adaptable memory update for solving write conflicts during memory update:The one-sided update makes a redundant computation approach to avoid write conflicts.The hybrid memory update has a parallel-conputation-serial-update framework,which is done in an MPE-CPE cooperative scheme.The self-adaptable memory update applies an inspector-executor scheme to achieve efficient replica allocation.All these methods can eliminate write conflicts in the computation of different potentials.3.The design vector-shuffling,parameter-profile,and lookup-free transcendental functions in vectorization:These strategies can achieve flexible vectorization.What's more,particle-to-cell cutoff filter and vector shortcutscheme are introduced,these strategies can reduce almost one-third of the computation workload.4.The design of bundled cell scanning for building neighbor lists and incremental construction of cell index:Bundled cell scanning improves data use and enlarges memory accessing block.Incremental construction of cell index makes use of the continuity of particle movement to exploit better locality and accelerate the construction of cell index effectively.5.The design and implementation of a series of general-purpose modules.For example,an efficient software cache module,SWCache,a performance counter management module,LWPF,and a lookup-free transcendental functions library.These modules not only support the work in this thesis butalso assists other researches on the TaihuLight supercomputer.This thesis also contains evaluations of the works above.The evaluations are done in LAMMPS,AMBER,and a self-written molecular dynamics framework,ESMD.Experiments show the work in this thesis can provide competitive performance on the SW26010 processor in comparison with other commercial processors.Also,the experiments achieved the simulation of over 275 billion atoms on 16 384 nodes with a sustainable performance of 2.43 PFlops/s.
Keywords/Search Tags:Sunway TaihuLight supercomputer, Molecular dynamics, Neighbor-list-free method, Vectorization, Massive parallelization, High performacne computing
PDF Full Text Request
Related items