High-performance Molecular Dynamics Research Based On Non-von Neumann Architecture Chip

Posted on:2024-06-18

Degree:Doctor

Type:Dissertation

Country:China

Candidate:P H Mo

Full Text:PDF

GTID:1528307334978169

Subject:Electronic Science and Technology

Abstract/Summary:

PDF Full Text Request

Molecular dynamics(MD)is a technique for computer simulation of complex atomic systems.Compared to physical experiments where it is difficult to directly observe tiny changes in the system,MD methods can directly observe the evolution of the system over time and extract physical properties of i nterest from it.Therefore,MD is widely used in various fields such as physics,chemistry,biology,materials science,semiconductor chips,and national defense.However,with the increasing demand for simulation accuracy and the growth of simulation size and duration,the conflict between accuracy and speed of MD is becoming more and more prominent.Both the accuracy and speed of MD depend on the solution of the potential energy surface.Ab-initio molecular dynamics(AIMD)based provides an accurate potential energy surface,but the computational complexity is O(N~3),which is too expensive.Classical molecular dynamics(CMD)based on empirical force fields provides higher computational efficiency with a complexity of O(N),but the form of the potential energy surface is too simple so that its reliability is difficult to guarantee.In recent years,machine learning-based molecular dynamics(MLMD)has been proposed as a way to alleviate this issue.In MLMD,a neural network potential(NNP)is constructed to learn from AIMD samples and achieve AIMD-level accuracy.Moreover,the computational complexity of this network model is O(N),which improves computational efficiency from a software perspective.However,due to the"memory wall"bottleneck of the von Neumann(v N)hardware architecture,the computational efficiency of MLMD is still 1-2 orders of magnitude lower than that of CMD(classical molecular dynamics).In addition,the semiconductor process node has entered the sub-10nm range,close to the physical limit,making it difficult to maintain the high-speed growth of processor computing power.In order to meet the huge demands for MD accuracy and speed,more work is needed at hardware architecture levels to develop accurate and efficient MD computational tools.To address this problem,this thesis conducts a high-performance MD research based on non-von Neumann(Nv N)architecture chips.The thesis proposes an Nv N architecture chip dedicated to NNP(neural network potential)inference,which adopts the processing-in-memory technology,breaks the"memory wall"bottleneck inherent in v N chips,and further improves computing efficiency.Subsequently,an MD-specific heterogeneous parallel computing system was designed based on the Nv N and v N chips,and various precision and speed tests were conducted on atomic systems to verify the generality and effectiveness of the design.Finally,the heterogeneous parallel computing system was deployed to a large-scale parallel cluster with multiple nodes to achieve load balancing of the computing system and effectively improve computing efficiency.In addition,this article proposes a NNP model training method based on transfer learning,which greatly reduces the required number of AIMD(ab initio molecular dynamics)samples and reduces the computational cost of sampling while ensuring high accuracy of the model.The main innovations of this thesis are as follows:(1)In this thesis,a specialized MD chip design based on the Nv N architecture is proposed to address the“memory wall”bottleneck that constrains the efficiency of MLMD calculations.Firstly,the algorithm design of the NNP model is optimized to obtain a lightweight NNP model that is easy to deploy on the Nv N architecture chip.Then,a hardware algorithm for model inference is implemented on the Nv N chip,utilizing the advantage of processing-in-memory for acceleration.On the one hand,the efficient pipelined calculation is implemented,avoiding heavy-duty shuttling of intermediate calculation results.On the other hand,the model parameters are saved in on-chip memory to avoid loading the model from off-chip memory frequently.These technologies reduce the demand for communication bandwidth,break the"memory wall"bottleneck,and improve the inference speed of the potential energy surface at the hardware algorithm level.(2)A MD-specific heterogeneous parallel computer system is designed based on the Nv N chip and the general-purpose v N chip.Firstly,the Nv N chip is used as the slave device to greatly accelerate the most time-consuming(>95%)potential energy surface inference.Secondly,the v N chip is used as the master device to handle other less time-consuming calculations,achieving functional universality.Moreover,a high-speed communication interface is used to connect the two devices,and techniques such as polling scheduling are utilized to reduce the time overhead of communication,enabling the Nv N chip to run at full load and achieve maximum computing efficiency.The high accuracy,high speed,and universality of this method have been validated by testing accuracy and speed on multiple atomic systems.Compared to running MLMD using advanced computing hardware such as Nvidia V100,this design has increased its computational speed,computational efficiency,and the maximum number of atoms in the system by 1～2 orders of magnitude.(3)Based on the Aliyun computing platform,the heterogeneous parallel computing system is extended to a multi-node large-scale MD parallel computing system.Firstly,the proposed heterogeneous parallel computing system is implemented on each computing node to maintain high computing efficiency for each node.Then,by deploying Slurm for task distribution and node scheduling,each computing node performs calculations in a load-balanced manner,achieving higher acceleration through multi-node parallelism.Based on this design,speed tests of multiple systems have been conducted,verifying that this method is 1～2 orders of magnitude faster than parallel running MLMD on Summit and other advanced supercomputing clusters at the same power consumption level.When 64 computing nodes are used in parallel,this method achieves an acceleration effect of 48 to 53 times,further expanding its computing capabilities.(4)A transfer learning-based training method is proposed to address the problem that high-throughput material screening requires a large number of AIMD samples to train the NNP model.Firstly,by adding the error item of atomic forces into the loss function,the number of samples required for each material is reduced from the order of10~5to the order of 10~3～10~4.Then,the knowledge from the existing models of source materials is transferred to the models of similar target materials by using the transfer learning method,enabling the target materials models to be trained with only a small number of training samples(10~2 of magnitude)to achieve high accuracy.For high-throughput material screening,the required AIMD samples will be reduced by 2～3orders of magnitude.

Keywords/Search Tags:

molecular dynamics, machine learning, non-von Neumann architecture, heterogeneous parallelism, computational acceleration

PDF Full Text Request

Related items

1	Research Of High Speed Molecular Dynamics Calculation Chip Based On Non Von-Neumann Architecture
2	Research On Water-related Properties Prediction Based On Molecular Dynamics And Machine Learning
3	The Research Of Metal Solidification Of Molecular Dynamics Simulation On Heterogeneous Platform
4	FPGA-based Parallelization Technology Of Molecular Dynamics Simulation
5	Molecular Dynamics Simulations Of Nano Oscillatory Flows Based On GPU Acceleration Algorithm
6	Research And Implementation Of Heterogeneous Computing Architectures For Privacy-Preserving Machine Learning Inference Acceleration
7	Study On Wafer Thinning Containing TSV Heterostructure Based On Molecular Dynamics
8	The Design And Optimization Of High-performance Molecular Dynamics Algorithms On The Sunway TaihuLight Supercomputer
9	A high-speed inter-process communication architecture for FPGA-based hardware acceleration of molecular dynamics
10	Performance Modeling Techniques For Deep Learning Applications