Font Size: a A A

Performance Optimization Of Sparse Lower Triangular Solver On Sunway Architecture

Posted on:2022-02-14Degree:MasterType:Thesis
Country:ChinaCandidate:Q W WangFull Text:PDF
GTID:2518306314474144Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Numerical simulation and Artificial Intelligence have been widely used in all as-pects of scientific research.engineering,and social life.While bringing great conven-ience to human society,they are also facing increasingly severe challenges in compu-ting power.The solution of large-scale sparse linear systems plays an important role in both numerical simulation and Artificial Intelligence,and it is particularly critical to improve its solution speed.There are two main methods for solving sparse linear sys-tems:direct method and iterative method,and sparse lower triangular solver has im-portant applications in the above two types of methods,whose parallel optimization can effectively improve the efficiency of solving sparse linear systems.In addition,hetero-geneous systems have become the mainstream of the current supercomputer architec-ture,and creasing number of computing cores requires more refined thread-level and instruction-level parallel optimization.Therefore,it's very urgent that parallel optimi-zation of the sparse lower triangular solver on the Chinese home-grown Sunway archi?tecture.The sparse lower triangular solver has such diffculties as low calculate-to-memory ratio,calculate-dependency,write-conflict,and low parallelism.So,it is very challenging that parallel optimization of the sparse lower triangular solver.The goal of this research is to implement a general math library function on the Sunway architecture,solving the sparse lower triangular systems.The contributions of this research include:First,implement the sparse lower triangular solver on sw26010,optimizing its per-formance and expanding its functions.After implement existing algorithm,expand its functions to support multiple data types and matrix formats.Optimize the perfor-mance of its preprocessing stage.And choose an appropriate accuracy verification method to eliminate the interference caused by the accumulation of errors.Further-more,for the new-generation Sunway architecture,a parallel algorithm for sparse lower triangular solver is designed and implemented.According to the new features of the architecture,the Sparse Level Tile layout and the producer-consumer pairing method are redesigned,which simplify the solution process while ensuring the effi-ciency of the solution.By modifying the data structure of the Sparse Level Tile layout,redundant communication in the solving process is avoided.And the interface for deal-ing with multiple right-end vectors is implemented,which further improves the effi-ciency of the solution.Finally,performance testing of the sparse lower triangular solver is performed on the two-generation Sunway architecture,and an auto-tun-ing scheme is designed.The two versions of the sparse lower triangular solver are obviously related to the degree of parallelism,and are also related to factors such as the proportion of diagonal non-zero elements and the distribution of non-zero elements.The algorithm needs to be configured according to the characteristics of the matrix,that is,auto-tuning.By evaluating all the 949 real and 32 complex square benchmarks from SuiteSparse Matrix Collection,the proposed method can obtain the better performance in over 73%benchmarks compared with the state-of-the-art methods for NVIDIA GPU.And the speedup effect of the proposed method is better in solving large-scale(matrix rows and columns greater than 10000)benchmarks.The proposed method can achieve a geometric-mean speedup of 7.56 and the best speedup of 26 on sw26010,compared with the sequential method on the management processing element.For 20 right-end vectors,the proposed method can achieve a geometric-mean speedup of 10.54 and the best speedup of 65.On the new-generation Sunway architecture,the proposed method can achieve a geometric-mean speedup of 6.51 and the best speedup of 34.In short,the sparse lower triangular solver implemented in this paper on the two-generation Sunway architecture has excellent performance in terms of function and performance.
Keywords/Search Tags:sparse lower triangular solver, Sunway many-core processor, parallel optimization, auto-tuning
PDF Full Text Request
Related items