| The quantum many-body problem is one of the most fundamental and important problems in condensed matter physics.However,the exponentially large Hilbert space makes the solution of the quantum many-body problem extremely challenging.Recently,the development of "AI for science" has brought new opportunities to solve quantum many-body problem.In the research work introduced in this paper,based on the new generation of Sunway supercomputing platform,we developed a quantum many-body simulation application named swAIQuMP based on a deep neural network,and scaled the application to the whole machine of the new generation of Sunway supercomputing platform(up to 40 million heterogeneous computing cores),thus greatly increasing the number of neural network model parameters and the number of Markov chain Monte Carlo samples,which greatly improved the ability of the neural network to represent quantum many-body models.swAIQuMP has achieved the current stateof-the-art in both accuracy and scale in of quantum many-body system simulation.The main work and achievements of this paper include the following aspects:(1)Based on the new generation of Sunway supercomputing platform,we designed and implemented the swAIQuMP application:firstly,we designed a three-layer abstract application hierarchy to facilitate user use and function expansion,and at the same time facilitate the application to adapt to other hardware platforms in the future;secondly,based on the parallel programming model of the Sunway accelerated computing architecture,we designed a three-level parallel framework including task,thread,and data level parallelism,and implemented the core runtime of the application according to the parallel framework;finally,we proposed a highly scalable data partitioning strategy for addressing memory bottlenecks during computation.(2)We carried out multi-level and multi-granularity performance optimization for swAIQuMP applications:on the one hand,we carried out architecture-level optimization based on the many-core architecture of sw26010pro,including heterogeneous many-core parallel acceleration,CPE memory access optimization and SIMD data parallel optimization;on the other hand,we optimized the two core modules of the application at the algorithm level.In addition,we also optimized the neural network training process for specific quantum many-body models.With a series of performance optimizations,the overall computing efficiency of the application has been greatly improved,and the time-to-solution has been reduced to a reasonable range.(3)We conducted a performance test and analysis of the application on the new generation of Sunway supercomputing platform:on the one hand,we designed comparative experiments to test the performance improvement brought about by a series of optimizations;on the other hand,we designed scalability experiments and analyzed the strong and weak scalability of the application.Experiments show that the swAIQuMP application can be scaled to nearly 40 million computing cores with a parallel efficiency of more than 90%.(4)With the help of the computing power of the new generation of Sunway supercomputing platform,we have successfully solved the ground state of the J1-J2 model and the t-J model with high precision,and achieved the current state-of-the-art on both accuracy and the scale of the quantum system. |