Font Size: a A A

The Block Gauss-Seidel/Jacobi Preconditioned Subspace Iterative Method For Many-core System

Posted on:2019-10-16Degree:MasterType:Thesis
Country:ChinaCandidate:L L WuFull Text:PDF
GTID:2428330623450960Subject:Engineering
Abstract/Summary:PDF Full Text Request
The growing demand of power of computing in large-scale scientific and engineering application pushes computer systems towards Exaflops.While the Exaflops system provides huge computing power,it also brings great challenges and opportunities to traditional numerical parallel algorithms.As a solver for solving large-scale nonlinear equations,the NKS(Newton-Krylov-Schwarz)algorithm is widely used in the field of parallel computing.For the Exaflops system in future,the NKS algorithm will face global communication bottleneck and the lack parallelism of heterogeneous many-core architectures,which has a huge impact on the performance and scalability of the algorithm.This paper studied the the block Gauss-Seidel/Jacobi preconditioned krylov subspace methods,which includes the following three aspects:1.Aiming at the global communication bottleneck existing in Krylov subspace methods,a new performance model which quantitatively analyzed the global communication bottleneck in large number of nodes is proposed based on the LogP model.On this basis,a non-blocking communication optimization strategy for vector inner product,vector norm and normalization in krylov methods(including GMRES,Chebyshev,Richardson and TCQMR)is proposed and implemented for PETSc high performance toolkit.The MPI_Allreduce and MPI_Iallreduce interface in MPICH is tested for performance comparison using MilkyWay-2 supercomputer.The result shows that when the process size reaches 1024,the performance of MPI_Iallreduce is obviously better than that of MPI_allreduce and the performance gap expands with the growing of process size until to 65536.Last but not least,we use nonlinear driven cavity with multigrid in 2d to verify the optimization effect using MilkyWay-2 supercomputer.The results show that the optimized Krylov subspace iteration method has a good scalability,when the process size reaches 1024,the optimized Krylov subspace method improves the performance by 16% ~ 26%.2.Aiming at the problem of unstructured meshes,a Gauss-Seidel/Jacobi block preconditioner for heterogeneous may-core architecture is proposed and implemented as a subdomain solver for the domain decomposition method.The convergence of block Gauss-Seidel/Jacobi algorithm is proved by mathematical derivation.The problem of simulating flows around high-speed train is used as a test on supercomputer of heterogeneous may-core architecture.The result shows the basic block Gauss-Seidel/Jacobi preconditioner has good preconditioning effects and scalability.Compared with the serial Gauss-Seidel algorithm,the basic block Gauss-Seidel/Jacobi algorithm delivers a 2.86 x speedup in preconditioning process.3.Based on the basic block Gauss-Seidel/Jacobi algorithm,this paper designed optimization strategies such as multiple lines copy,computation-compunication overlap and low communication complexity numerical optimization.The result of numerical experiments shows that the optimized block Gauss-Seidel/Jacobi algorithm with low complexity of communication achieves a 4.16 x speedup compared with the serial Gauss-Seidel algorithm.For the parallel efficiency,the block Gauss-Seidel/Jacobi algorithm achieves a 61% efficiency as the number of processors increase from 1,040 to 33,280.
Keywords/Search Tags:Exaflops System, NKS Algorithm, Global Communication Bottleneck, Heterogeneous May-core Architecture, Block Gauss-Seidel/Jacobi Preconditioner
PDF Full Text Request
Related items