| As the first general computing software for physics,chemistry,materials and other fields in China,ABACUS is widely used in various fields.However,with the rapid devel-opment of the field of high-performance computing,in order to meet the growing scientific research needs,new high-memory,high-bandwidth,high-throughput hardware devices such as GPUs and DCUs are constantly emerging.ABACUS software has realized MPI multi-process parallel computing in the numerical atomic orbital part,but there is still a problem of insufficient performance in large-scale system molecular computing.There-fore,in order to take full advantage of these emerging accelerated hardware and software optimization methods,it is urgent to further optimize ABACUS software to improve its computing performance.This thesis proposes three optimization schemes: Firstly,the multi-threaded and multi-process hybrid optimization strategy is adopted,and the parallel rewriting of the ap-plication at the thread level is carried out based on the characteristics of hardware devices.Secondly,by optimizing the memory access mode of non-local pseudopotential functions,the memory access delay is reduced and the computing efficiency is improved.Finally,CPU+DCU heterogeneous optimization is adopted to make full use of the parallel comput-ing performance of DCU and other GPU devices.This thesis verifies the correctness and applicability of the proposed optimization strategy by running tests on three different plat-forms: Intel,AMD and Hygon.Experimental results show that the optimized ABACUS software significantly improves the computational performance in system calculation.The main work and contributions of this thesis are as follows:(1)MPI+Open MP hybrid optimization.In this study,the performance analysis and serial code optimization of the numerical atomic orbital section of ABACUS software were carried out.The open source software vtune,tau,perf and other performance analysis tools are used to analyze the function hotspot of ABACUS,and on the basis of serial code optimization,the fine-grained parallel optimization of this part of the code is further carried out.(2)Optimization of access to non-local pseudopotential parts.On the basis of the previous chapter,the MPI + Open MP + memory access mode optimization is performed on the non-local pseudopotential part.Aiming at the runtime memory bottleneck problem of system computing,a shared memory mode is proposed.(3)CPU+DCU heterogeneous optimization.On the basis of the previous two opti-mization methods,a heterogeneous optimization algorithm of CPU + DCU is proposed.The non-local pseudopotential algorithm of multi-DCU accelerator is preliminarily im-plemented on the Hygon platform to break through the CPU computing peak limitation and provide a practical basis for solving larger macromolecular system problems. |