Font Size: a A A

GPU-based SpMv Parallel Acceleration And Performance Optimization

Posted on:2024-05-07Degree:MasterType:Thesis
Country:ChinaCandidate:D Q HuangFull Text:PDF
GTID:2568307175987599Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The Sparse Matrix Vector Multiplication(Sp MV)is a general and important operation,which is widely used in many scientific fields.With the continuous increase in the scale of data,Sp MV has become a computing hotspot in various fields.Traditional processors are no longer able to meet the needs of large-scale numerical computing,and the floating-point computing performance of GPU has far exceeded that of general-purpose CPU.The CPU/GPU heterogeneous parallel Sp MV algorithm has significant research significance.Therefore,this paper conducts research on Sp MV from the following three aspects.(1)In order to solve the problems of low memory access bandwidth utilization and unbalanced load in multithreaded computing in parallel Sp MV computing.This paper designs a new sparse matrix storage format BEC based on ELL.Firstly,the value matrix and column matrix are divided into data blocks of consistent size based on the average number of non-zero elements in each row,and the last data block in each row is filled with zero elements or stored in COO format.Then,the Sp MV calculation is divided into three parts: multiplication and addition calculation on the data block,reduction sum computation of the data block results,multiplication and addition calculation in COO format.The BEC format is used to achieve a fine-grained load balancing Sp MV.This paper uses 20 sets of sparse matrices as the benchmark suite.The experimental results show that the average acceleration ratio of BEC Sp MV compared to Merge,MKL,and CSR5 is 1.73 x,2.66 x,and 1.54 x,respectively.On the Nvidia GPU platform,the average acceleration ratio of BEC Sp MV compared to Merge and cu SPARSE has reached 1.71 x and 2.17 x respectively.(2)In order to solve the problem of different Sp MV computing performance of the same storage format on different sparse matrix data sets,this paper sets the optimal storage format for each matrix data set based on the computing performance of Sp MV under different data sets of five storage formats: CSR5,Merge,MKL,ALBUS,and BEC,and generates datasets based on the sparse matrix characteristic parameters.Using decision tree,SVM,MLP,XGBoost,and Light GBM machine learning algorithms for for model training on the generated dataset,the accuracy of these five machine learning algorithms in selecting the optimal Sp MV storage format reached 63.9%,66.7%,69.1%,72.4%,and 72.1%,respectively.(3)In order to improve the computational efficiency of numerical weather forecasting.Based on the CPU/GPU heterogeneous platform,this paper implements the parallel optimization of the generalized conjugate residual algorithm(GCR)for solving the Helmholtz equation.The main methods used to improve the computational efficiency of GCR include reducing data transmission volume,improving memory access continuity,and non-blocking communication.On the Nvidia Tesla V100 GPU platform,the experimental results show that compared to serial GCR programs,the MPI+CUDA heterogeneous parallel acceleration ratio reached 4.69 x.
Keywords/Search Tags:SpMV, GPU, Storage Format, Heterogeneous Parallelism
PDF Full Text Request
Related items