Font Size: a A A

Design Of The Solver For Large Scale Linear Sparse Equations Based On CUDA

Posted on:2014-10-02Degree:MasterType:Thesis
Country:ChinaCandidate:C J WuFull Text:PDF
GTID:2268330401465884Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
Solving the large-scale linear equations has been an important work in scientificcomputation. With the development of graphics processor (GPU) hardware architecture,function of the GPU has been derived to GPU general computing. GPU, as acoprocessor of CPU, completes the large-scale data-intensive computing tasks. GPUcomputing power is almost equivalent of a small-sacle cluster. But comparing cluster,GPU has lower power comsumption and cost. In2007, NVIDIA released a new parallelcomputing platform, named CUDA which reduced the difficulty of using GPUaccelerated computing. With CUDA, it is more convenient to use GPU-acceleratedcomputing to solve the problem by researcher. This is enabled more and more scientificresearch fields to use GPU computing.Based on CUDA, the research content of this article is to achieve larg-scale linearequations on GPU. Compressed Row Storage (CSR) format is used to storage thelarge-scale sparse matrix. Conjugate gradient (CG) algorithm is adopted to solve thelarge-scale linear equations. In every iteration of the CG-algorithm, serveral scaledvector additons, dot products and a matrix-vector multiplication (SpMV) are computedby GPU. And the codes of products and SpMV are programed by the author, vectoradditons are realized the function by CUBLAS library.Because the sparse matrix issymmetric positive definite, story the upper triangle data of the matrix by CSRformat.During using the upper triangular data to SpMV, Decompose the SpMV intoMultiplication andAddition Operations. Using the lower triangular data to SpMV, itneeds the upper triangular data which are stored by CSR format. In the algorithm design,atomic operation is used to avoid different threads reading or writing the same addressat the same time. Because total amount of GPU global memory is6GBytes, thelarge-scale matrix is needed to divide into serval blocks, and every size of the blocks isless than the total amount of global memory. As the result, the data, which is computedby GPU, is suitable for the GPU capacity at every turn. This improves the utilization ofGPU. In the dot product operation, changing the offset to avoid the bank confict during reduce, which improves performance of the program. In this algrothm, the performace isbetter than using CUBLAS library to compute dot product. At last, adding Jocobiprecondition (JPCG) in CG algrothm to improve the convergence rate.Through the analysis the performace of multiple sets of different scale sparsematrix, the speedup of the SpMV operation executed by GPU is30, the speedup of dotproduct is6, and the biggest speedup of the solving equations is46, compared to CPUexecution. Those show that using the GPU to sovle the equations can be obtained thebetter efficiency. At the same time, the program can automatically adapt to the differentsize of the sparse matrix. The performance of the division algorithm which is designedin accordance with the capacity of GPU, is better than the algrothm which is designedwith dividing the sparse matrix into uniform blocks. The performance of the SpMVoperation using the data which is stored the upper triangle matrix is bad than using thedata which is stored the all elements of the matrix. But the algorithm which is sotringthe data of upper triangle matrix, can deal with more lager scale matrix than thealgorithm which is storing the data of the all elements of the matrix. The convergencerate of JPCG algorithm is faster than CG algorithm.
Keywords/Search Tags:GPU, CUDA, Parallel Computing, CG, JPCG
PDF Full Text Request
Related items