Font Size: a A A

Research On Algorithm Design And Optimization Methods Of Molecular Biology Applications For The Domestic Sunway Manycore System

Posted on:2021-03-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:J S ChenFull Text:PDF
GTID:1360330602996223Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Biomacromolecules such as proteins and their complexes are delicate machines at the atomic level,and they are also targets for drugs.We can reconstruct the time course of molecular dynamics changes by simulating structural evolution with atomic resolution and combining with statistical thermodynamics theory.Analysis of the driving mechanism can provide a basis for biological function interpretation,prediction and drug design.By utilizing the computing power of contemporary supercomputers,the computational time of a molecular simulation can be reduced from weeks or even months to days or even hours,which greatly promotes the application of molecular simulation in the fields of basic biological research and drug design.The Sun-way Taihu-Light supercomputer is built with the second generation domestic Sunway many-core processor,and it ranked first in the international Top500 supercomputer list for four times.Compared with contemporary commercial multi-core processors,it has more parallel computing units and distinct on-chip memory hierarchy,which is better at regular and easy parallel computing-intensive tasks.In contrast to other commercial many-core processors,there is no perfect parallel programming or runtime environment(such as GPU/CUDA)designed for its unique processor architecture,which makes the parallel algorithm design and performance optimization on this architecture meet many challenges.It is urgent to develop algorithm design and optimization methods driven by significant applications.The core algorithms for molecular biology applications are sparse computational problems with indirect and discrete memory access patterns.This paper researches the parallel design and performance optimization methods for them on the Sunway many-core processor,to make full use of the massive on-chip computing units and inter-core communication.The key technical issues addressed in this paper include how to over-come the severe memory bandwidth limitations,and develop the strength of on-chip computing and communication hardware resources.The main research contents include:1.Developed a parallel computing method of empolying a computing core as the on-chip shared write buffer,which overcomes the memory constraints.Each Sunway computing core has only one private local data memory with limited space.The computing cores can exchange data through a register communication based on the on-chip interconnect network.To solve the excessive memory access overhead caused by limited local data memory space and write conflicts when updating force,we harness this hardware feature to develop a parallel computing scheme that seperates the CPEs into and a storage core.The storage core is used as the shared write buffer for other computation cores.The computation cores send the force data to the storage core with the register communication.Furthermore,in view of the hardware features and limitations of register communication,we also propose an efficient optimization method of the data communication between the computating core and the shared write buffer core.The experimental results show that our algorithm achieves a maximum speedup of more than 30x than the master core,and the overhead of off-chip memory access,inter-core communication and load imbalance is reduced to a reasonable amount of time.2.Designed a task blocking method and multi-level parallel scheme for highly efficient molecular dynamics simulation on the Sunway many-core architecture,which improves the locality and parallelism.The parallel efficiency of this algorithm on the Sunway many-core processor is greatly constrained by the random memory access and write conflicts.We propose a super-cluster neighbor list,which increasing the granularity of the non-bonded interactions,to enhance both the temporal and spatial locality and maximize the computation to memory access ratio of the computing kernel.Becuase there is no low overhead locking mechanism on the Sunway many-core processor,using multiple force replica is a more feasible method to avoid write conflicts.However,the resulting data reduction overhead limits the parallel efficiency.We propose a multi-level parallel scheme,which combines computing cores into groups and exchange data with register communication,to achieve a compromise of this contradiction.The experimental results show that the optimized computing kernel achieves a performance speedup of 226x compared to the master core and achieves 20%of the theoretical floating point efficiency of the Sunway many-core processor.3.Designed a large-scale parallel computing scheme for molecular docking simulation on the Sunway Taihu-Light system which enables efficient parallelization of high-throughput virtual screening at the system scale of 10 million cores.Existing molecular docking applications are mostly designed for high-performance computing systems built on general-purpose multi-core processors,and unable to utilize the tremendous computing power of the many-core processor platforms.Firstly,we propose the parallel optimization method for the single docking process on the Sunway many-core processor.Since the data structure is inappropriate for the Sunway many-core architecture,a series of data structure adjustments is proposed,including eliminating the redundance of key data structures in memory to reduce the size of the program working set,compressing the non-bonded atom pairs and redesigning the energy grid with redundant structure.In addition,the instruction level and data level parallelism of the computing kernel are exploited by the loop unrolling and vectorization.The single-level master-slave parallel mode adopted by the molecular docking algorithm leads to huge I/O and communication pressure on the task scheduling node,which limits the scalability.Secondly,we propose a multi-level master-slave parallel scheme,as well as a light-weight I/O task partitioning among the scheduling nodes and an asynchronous non-blocking communication interface.The experimental results show that the optimized molecular docking simulation can achieve efficient parallel computing and rapid drug virtual screening on Sunway Taihu-Light system.This platform reduces the time to complete the virtual screening of nearly 40 million compounds known to humans from one day to one hour.The series of parallel algorithm design and optimization methods proposed in this paper have been applied to the molecular dynamics simulation and drug virtual screening software on the Sunway Taihu-Light system.They solve a series of challenges for efficient parallelization of these software and achieve near-linear acceleration of the core algorithms on this many-core processor.Furthermore,the design and optimization methods of these parallel algorithms can provide reference for the efficient implementation of other sparse computing problems on the Sunway many-core processor.Meanwhile,they are valuable references for improving the next generation Sunway many-core processor architecture and building the software ecosystem.
Keywords/Search Tags:High performance computing, Domestic many-core, Molecular dynamics simulation, Molecular docking, Parallel algorithm
PDF Full Text Request
Related items