The Research Of Solving Sparse Linear Systems Based On GPU Preconditioned Conjugate Gradient Parallel Optimization Method

Posted on:2016-04-15

Degree:Master

Type:Thesis

Country:China

Candidate:P Ding

Full Text:PDF

GTID:2308330479994815

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

As the performance of graphics processor(Graphics Processing Unit, GPU) has been greatly improved, not only the applications such as computer simulation and image processing have been rapidly promoted, but also it provided a good platform for general purpose computing beside graphics. In the field of scientific computing, iterative method for solving large sparse linear system has important significance in the practical application, such as turbulent atmospheric pollution in the city, video processing, fluid and mechanical, material simulation and biological medicine.Computationally intensive numerical algorithm has the memory wall problem, especially in the GPU parallel process. Sp MV(Sparse Matrix-Vector multiplication) algorithm is a typical representative of this kind of problem. To solve the problem like these, we usually use compressed storage format of sparse matrix, and design a parallel algorithm for sparse matrix vector multiplication based on GPU, and then accelerate the speed of the preconditioned conjugate gradient algorithm through the collaboration of CPU and GPU methods. Finally, the goal of this paper is to obtain the following research results:First, we analyzed that the the most time consuming in preconditioned conjugate gradient algorithm when solving sparse linear systems is to compute the Sp MV. And according to the characteristics of GPU and the symmetric matrix of multi thread parallel computing requirement, we accelerate the speed of sparse matrix vector multiplication calculation from the storage format, memory hierarchy, task partition and thread mapping etc...Second, we put forward the optimization strategy on the GPU about the preconditioned conjugate gradient method. For GPU coprocessors, the method that optimization methods of thread and matrix mapping, data merging, data multiplexing, hiding the high latency of accessing GPU’s global memory by effective thread scheduling and hiding data transfer between GPU and CPU by asynchronous computation is used to accelerate the calculation speed.Third, the optimization strategy on the Xeon Phi about preconditioned conjugate gradient method is proposed. For Xeon Phi processors, we use optimization methods of data dependence reduction, vectoring and hiding data transfer between Xeon Phi and CPU by asynchronous computation.This paper proposes a new parallel conjugate gradient algorithm which focuses on solving linear equations of large-scale sparse matrices. Respectively parallel optimized on GPU processors and Xeon Phi processors, and then we validated its feasibility and correctness. After that, we compared operation efficiency of several methods, and GPU outperformed Xeon Phi in our test cases. Finally, in the public test set we respectively tested two kinds of optimization methods which were presented in this paper, and compared with other recent optimization effect, the results shows that ours achieved the better effect and has the universal significance.

Keywords/Search Tags:

Preconditioned conjugate gradient method, SpMV, CSR, GPU, Xeon Phi

PDF Full Text Request

Related items

1	Conjugate Gradient Methods For Super-resolution Image Reconstruction Problems
2	Research On Compressed Sensing Restructing Algorithm Based On Interior Point Method
3	Spectral Conjugate Gradient Method And Three-term Conjugate Gradient Method With Applications In Image Restoration
4	.3 D-ic Thermal Analysis Algorithm Research
5	Iterative Methods For Large Sparse Saddle Point Linear Systems
6	Regularization Strategies Of The PCG Algorithm In Image Restoration
7	The Research On PAR Reduction For OFDM System Base On Conjugate-gradient Method Algorithm
8	The Research Of High Performance Algorithms In Physically Based Fluid Animation
9	Research On Fast Simulation Of Realistic Smoke Based On Vortex Particles
10	The Research Of High Performance Preprocessing Techniques And Distributed Parallel Algorithm For Toeplitz Systems