Font Size: a A A

Research On Accelerating Krylov Subspace Method With Gpu

Posted on:2011-03-04Degree:MasterType:Thesis
Country:ChinaCandidate:J QinFull Text:PDF
GTID:2198330338489851Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years, the architecture, programming model and related aspects of general purpose graphics processing unit (GPGPU) have been developed rapidly, the programmability of GPU have been improving all the time, and the peak performance of GPU grows from several Gflops to several Tflops. The main application of GPU has shifted from image processing to gener-al-purpose computing, and more and more researchs are focusing on accelerating scientific problems with GPU. In scientific problems, utilizing iterative method to solve the large sparse linear systems is the key in many fields including the fluid dynamics, numerical weather predic-tion, oil and seismic date processing, materials simulation and design. The Krylov subspace me-thod is an efficient algorithm solving the large sparse system, which both has less storage re-quirement and stable convergence. However, solving these problems on CPU consumes lots of time which cannot meet the practical requirements. Accelerating iterative method that solving large sparse linear system with GPU can not only reduce the time but also improve the compu-ting efficiency, which can be generalized to speed up a lot of practical problems.After analyzing the architecture and CUDA programming model of GPU, we discussed the difficulties in adapting Krylov subspace method to NVIDIA GPU, including the wasting of the storage space introduced by sparse matrix; the implementation of sparse-matrix multiplication in GPU; the implementation of vector inner-product operation in GPU,the decreasing of the per-formance bringed by kernel overhead; the lack of reduction operation on NVIDIA GPU; the need of special optimization methods on NVIDIA GPU and so on. Through the cooperation of CPU and GPU, we proposed some novel methods to realize the algorithm in NVIDIA GPU on CUDA programming model. The main contributions in this paper are as follows:We proposed some arithmetic optimization method faced on CUDA program model, deeply studied the features of NVIDIA GPU, at first, we noticed that the NVIDIA GPU has a complex storage hierarchy, including registers, shared memory, local memory, global memory and other two kinds of read-only memory—texture memory and constant memory. These memorizers have great differences in speed and capacity, while optimize the algorithm we should make full use of those high speed memory, such as the shared memory, texture memory and even the registers. In CUDA program model, one of the concept that has keen relation of the hardware is warp, while run the program in NVIDIA GPU, the warp is the real unit of execute. In our optimization of Krylov subspace method, we try to fulfill the need of warp or half-warp.We describes a new compress format of sparse matrix based on DIA compress format (CDIA), the CIDA compress format of sparse matrix can obtain higher bandwidth and can fulfill the requirement of coalesced access, CDIA compress format include an operation of transpose of the matrix, by doing this we can get higher memory bandwidth, and improved the memory access performance of GPU, and the CIDA compress format did great effect in some important GPU kernel. While used in SPMV, it enhanced performance about 50.3% that with ordinary DIA compress format.We proposed the sparse matrix-vector multiplication in GPU, made full use of the CDIA compress format, in CUDA program model of NVIDIA C1060, and gave each thread fine-grained task distribution. By optimizing the memory access, data transportation and other aspect of GPU, In the data experiment, our best implementation achieves up to 39.6Gflop/s in single-precision and 19.6Gflop/s in double-precision, enhanced performance about 7.6% and 17.4% that of Bell and Garland's.We implemented the vector inner-product operation in GPU, we know that the reduction operation is the base of the vector inner-product operation, by make full use of the shared mem-ory of GPU, the performance of vector inner-product operation increased 36.2 times in sin-gle-precision instance.We proposed several optimization measures for Krylov subspace method in NVIDIA GPU, there are several method for optimization: structure of the program, communication between the CPU and GPU, and use the high speed memory, especially the shared memory, by doing this, we can improve the bandwidth of memory access, we can also use pinned memory. In our data ex-periment, our best implementation achieves up to 11.3Gflop/s in double-precision and 16.7Gflop/s in single-precision, make the speed 9.2 and 13.57 times faster then in CPU.Different matrix dimension have different performance this was determined by the archi-tecture of GPU, our experiment indicated that the GPU have great power in accelerate linear system.
Keywords/Search Tags:GPU, CUDA, Krylov Subspace Method, Sparse Matrix-Vector Multip-lication, vector inner-product operation
PDF Full Text Request
Related items