Font Size: a A A

Parallelization And Locality Optimization For Red-black Gauss-Seidel Stencil

Posted on:2022-12-09Degree:MasterType:Thesis
Country:ChinaCandidate:Y R JiFull Text:PDF
GTID:2518306743987159Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Stencil(stencil computing)is a common type of loop-nested computing model,also known as "structured grid computing",and is one of the thirteen Berkeley-defined high-performance computing core models.Designing various new block and vectorization schemes that satisfy the data dependencies defined by Stencil computing is the key to improving the parallelism and locality of Stencil computing and thus the overall performance of the program.At present,a large number of efficient block and vectorization methods have been proposed for Stencil,but most of them are for Jacobi-type Stencil with high parallelism.Gauss-Seidel type Stencil has better convergence speed and is widely used in multi-grid calculations,but the data dependence of this type of Stencil is more complex,and parallelism and locality optimization are more difficult.This paper designs a new parallel block algorithm and vectorization scheme for Gauss-Seidel Stencil with red and black sorting,which improves data locality and program parallelism.The main contributions are as follows:1.A parallel block tiling algorithm suitable for Gauss-Seidel is proposed.Fine-grained reconstruction for Gauss-Seidel red-black sorting is carried out on the original time dimension and iterative space,and the red-black grid point order of a new type of special-shaped block and block boundary that satisfies the red-black sorting is designed in the extended iterative space,and Further use of cross-judgment is generalized to be applicable to any problem size,block size and block start position,and the final design and implementation of Stencil's medium-granularity block algorithm has lightweight and concise loop conditions,and better use the cache level.structure with better multi-core inter-core parallelism.2.Propose a fine-grained vectorization scheme suitable for the above Stencil block algorithm.Design a new data layout scheme,separate storage space dimension and grid points in data space,design a corresponding vectorization scheme,reduce redundant calculation and eliminate the spatial data conflict problem of input data in vectorization of Stencil calculation,Satisfy the characteristics and dependencies of the Gauss-Seidel Stencil block algorithm.3.Efficient implementation and comparative verification analysis of a typical Gauss-Seidel Stencil.In the case of multi-core parallelism,the one-dimensional to three-dimensional performance of this algorithm is improved by 3.23 times,4.15 times,and 2.35 times,respectively.24 cores are 12.94 times,14.56 times,and 21.09 times higher than serial in one-dimensional,twodimensional,and three-dimensional,respectively.When the single-core data size is 221,the performance gap is 10.62 times.In the comparison of different block sizes,the performance of the improved algorithm is 5.60 times that of the existing algorithm.In the scalability test,the performance on a single CPU is 3.23 times that of the original algorithm,the performance on multiple CPUs is up to 3.91 times that of the original algorithm,and the average speedup ratio is3.43 times.The Seidel Stencil chunking method has better in-core fine-grained parallelism.
Keywords/Search Tags:Stencil Computation, Blocking, Gauss-Seidel
PDF Full Text Request
Related items