Font Size: a A A

Optimization And Research Of SpMV Algorithm Based On DCU Accelerator

Posted on:2023-01-12Degree:MasterType:Thesis
Country:ChinaCandidate:M Y BoFull Text:PDF
GTID:2568306623493394Subject:Software engineering
Abstract/Summary:
Sparse matrix vector multiplication(SpMV),as a core algorithm in scientific computation,has been widely used in traditional scientific engineering research and emerging neural network sparse technology.With the rise of heterogeneous computing in the field of high performance computing,numerical computation based on SpMV algorithm has achieved remarkable acceleration effect on heterogeneous platform.In the new generation of high-performance heterogeneous computing platforms in China,DCU,as an independently developed accelerated computing device,has been deployed in large numbers.Among them,the mathematical operation function library represented by SpMV still has some adaptation problems such as transplantation and tuning,which need to be improved.In this paper,the calculation and access modes of SpMV algorithm are studied,and the serial SpMV algorithm is transplanted to CPU+DCU heterogeneous platform.Combined with DCU memory hierarchy and parallel programming model,a performance optimization method with good scalability is proposed,which provides support for sparse matrix vector multiplication algorithm library in heterogeneous high-performance computing platform.The main work and contributions of this thesis are as follows:(1)Transplantation of SpMV algorithm on DCU accelerator.The storage format and SpMV mathematical formula of sparse matrix were analyzed,and the vector multiplication of serial sparse matrix in CSR format was realized.The program performance analysis method with dynamic and static combination was adopted to determine the hot program segment,and the data dependence and data access mode of this part were analyzed to determine the feasibility of parallel and the calculation bottleneck.On this basis,combining with the hardware characteristics of DCU and following HIP programming model,the replacement of thread index to cyclic subscript is completed,and the transplantation of serial SpMV to DCU accelerator is realized.(2)Based on heterogeneous SpMV,the SCSR algorithm of row division strategy is proposed.In order to solve the problems of low efficiency of data access and low parallelism of threads in heterogeneous SpMV algorithm,this paper uses row division and group calculation to realize data merge memory access and high concurrency of threads.Combined with the access characteristics of the algorithm and the hardware memory model,a deeper optimization is carried out to reduce the access cycle and improve the access efficiency by controlling the highly reusable data layout.Add more independent instruction operations by setting up cyclic partitions,hide instruction and memory latency;The dynamic scheduling and summing strategy is used to realize the concurrent computation of multiple threads and improve the execution efficiency of threads.(3)Based on SCSR algorithm,M-SCSR algorithm of MPI+HIP coarse-fine-grained multi-DCU accelerator is proposed to break through the limitation of single accelerator storage and calculation peak value and solve the problem of long operation time of million-scale sparse matrix.The model uses master-slave mode to reasonably allocate tasks between processes,uses behavior unit to divide data to reduce data communication between devices,and uses haiguang 1CPU partition feature to control DCU to realize concurrent data transmission between host and device.The work done in this thesis has been realized and verified on domestic DCU accelerator.The results show that compared with the SpMV algorithm in hip SPARSE library,the SCSR algorithm can accelerate more than 1.5 on the single DCU accelerator.For sparse matrices with a scale of more than one million,m-SCSR algorithm using four DCU accelerators can achieve 3.45 times of acceleration effect compared with SCSR algorithm.Experimental results verify the universality,validity and expansibility of SpMV series algorithms.
Keywords/Search Tags:DCU accelerator, SpMV algorithm, Heterogeneous computing, MPI mixed programming, To fetch optimization
Related items