Font Size: a A A

HPC-AI Convergence Computing Method For Exponentially Complex Quantum Many-Body Problem

Posted on:2024-09-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:M F LiFull Text:PDF
GTID:1520306932457704Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
For decades,the quantum many-body problem of studying complex quantum entanglement and exotic quantum behavior among particles,limited by the fact that the Hilbert space beneath the problem grows exponentially with the number of particles("exponential wall"),has been one of the most central and challenging research fields in modern physics.With the advent of scientific intelligence computing methods,the remarkable advantages of deep learning methods,such as neural networks,in highdimensional feature modeling and compressed representation,shed new light on the resolution of numerous exponentially complex challenges in the realm of scientific computing.However,leveraging neural network methods to solve the quantum many-body problem necessitates a tight integration of high performance computing and artificial intelligence at the level of computing methods and algorithm design.Traditionally,the relatively independent development of high performance computing and artificial intelligence makes the integration of these two computing methods challenging for algorithm design and performance optimization.Consequently,there is an urgent need to develop more effective high performance computing and artificial intelligence(HPCAI)converged computing methods to adapt to existing software and hardware system environment.The thesis takes the exponentially complex quantum many-body problem as a research case,and investigates the crucial aspects of HPC-AI converged computing in terms of solution methodologies and algorithm design and implementation.By analyzing the performance bottlenecks and algorithm constraints of data-driven neural network methods within the context of contemporary heterogeneous parallel system,it designs a converged computing method that leverages the strengths of both high performance computing and artificial intelligence,with an emphasis on more effectively facilitating the resolution of the quantum many-body problem on prevalent heterogeneous parallel systems.The research work and results of this thesis include the following four aspects:1.Research has been carried out around the data sampling in the converged algorithm of neural network and Markov chain Monte Carlo(MCMC).This thesis presents a two-level parallel MCMC sampling algorithm,incorporating interprocess MPI and intra-process batch strategies,which addresses the issue of limited computational amount of neural networks along the Markov chain.The proposed method harnesses the powerful performance of heterogeneous parallel systems,leading to high-throughput neural network quantum state sampling.In this study,by maintaining numerous independent Markov chains and separating the neural network calculations of the Markov process,the serial calculations along the chain,originally iterated with time steps,are organized in batch form.These calculations are combined with the underlying operator library to adapt to high performance heterogeneous systems.Experimental results on the new generation of Sunway supercomputer demonstrate that the two-level parallel sampling algorithm significantly improves the computing performance of the neural network.It maintains a 90%parallel efficiency on nearly 40 million heterogeneous cores,supporting parallel MCMC sampling of over 10 million independent Markov chains.On a supercomputing cluster equipped with GPUs,the algorithm also maintains a strong scalability of 94%with nearly one million cores.2.Research is conducted on model optimization using neural network as wavefunction ansatz in variational Monte Carlo method.This thesis establishes a collaborative computing framework based on a large batch of high-quality data and a scalable stochastic reconfiguration algorithm to support high-precision and scalable variational wave-function optimization.This approach overcomes numerous local minima during model optimization and achieves a numerical accuracy far higher than that of traditional deep learning tasks(10-3~10-5).This thesis proposes a data quality assessment algorithm based on sample energy to identify and discard erroneous data in MCMC sampling,thereby improving data quality.Additionally,it solves the scalability problem of the stochastic reconfiguration algorithm by implementing a 2D-mesh parallel correlation matrix calculation.The experimental results indicate that the collaborative computing framework leverages the strengths of both high performance and artificial intelligence computing to greatly improve the solution scale of existing algorithms,which pushes the scale of spin system to 16×16,as well as 2256 Hilbert space.3.The randomness problem of solving the quantum many-body problem with neural network methods is studied in this thesis.It proposes two initial state optimization algorithms based on transfer learning and parallel selection,which address the issue of models trained from random initial states becoming easily trapped in local minima.In the J1-J2 model,transfer learning is adopted to bridge the differences between various tasks,sharing common physics laws across multiple dimensions such as symmetry,system scale,and J2 values.In the t-J model,an approach is designed using sample energy to assess the quality of the initial state,which is further adapted to the previously mentioned two-level parallel MCMC sampling framework.This allows for the efficient parallel selection of higherquality initial states from numerous candidates.The experimental results show that the proposed initial state optimization methods can significantly improve the state-of-the-art result on 18 × 18 system.For the first time,it reports the ground state energy of 24×24 system,and realizes 21296 exponentially complex quantum many-body simulation for spin system and 3144 for fermion system.4.The problem of computational organization in solving the quantum many-body problem on a heterogeneous platform is discussed.This thesis adopts a finegrained dataflow model to uniformly organize key procedures in the HPC-AI converged program,which improves the efficiency of single-sample gradient calculation and matrix factorization using subgraph partitioning and task scheduling algorithms,respectively.For neural network computing,subgraph partitioning is employed to divide the computational graph,fully tap the parallelism between operators,and adapt to the unique large-memory execution mode of sw26010pro architecture.For matrix factorization,this thesis designs a two-level scheduler to adapt to the GPU hierarchical structure,supporting global task distribution and local task scheduling respectively.Additionally,we propose two heuristic scheduling algorithms on the dataflow runtime system prototype.Experimental results reveal that the proposed algorithms of subgraph division and priority scheduling algorithms effectively harness fine-grained parallel computing and improve the computing performance of the subprocedures.This thesis centers on the HPC-AI converged computing,and takes solving exponentially complex quantum many-body problem with a neural network method as the main line.By combining the strengths of high performance computing and artificial intelligence,the HPC-AI converged computing methods are designed to overcome the challenges posed by existing software and hardware environments,resulting in obtaining the ground state energy of the quantum many-body problem with remarkable accuracy and unprecedented scales,with Hilbert space as large as 21296 for spin system and 3144 for fermion system,respectively.As the first successful case of solving the AI for Science problem with ultra-large-scale parallel training on the new generation of Sunway supercomputer,it is of significant importance for the development of deep learning software environment on domestic supercomputing systems.
Keywords/Search Tags:High perfromance computing, Deep learning, Quantum many-body problem, Neural network quantum state, Heterogeneous system, Dataflow computing
PDF Full Text Request
Related items