Optimizing Sparse Matrix-vector Multi Based On OpenCL

Posted on:2013-12-23

Degree:Master

Type:Thesis

Country:China

Candidate:J Q Zhao

Full Text:PDF

GTID:2248330371985578

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

In the past few years we have seen in the mass GPU parallel computing in an importantrole. From the unit area and the power consumption of the unit to see, the computing power ofGPU is much higher than the CPU. Although the past in the large-scale parallel operation onthe GPU to the programming is an important challenge, but now manufacturers for the user toprovide a more efficient, intuitive development platform, such as CUDA, OpenCL otherprogramming tools that allow more developers more easily more efficient in the GPUplatform programming. OpenCL has become an important current GPGPU solution.Sparse matrix vector multiplication is mathematics and engineering used in class one ofnumerical algorithm, and a lot of numerical algorithm and can be converted to matrixcalculations, such as image processing and engineering science of information, solving linearequations, fast Fourier transform, the optimization. Solve the optimization problem of thesparse matrix vector multiplication can improve the performance of engineering sciences.This paper first with GPU computing as the background, the development process andthe GPU to some basic concepts do some introduction. Second, expounds the structure of theGPU series of AMD, starting from the hardware, through understanding the structure of thehardware platform for optimization do some preparation, and OpenCL architecture fromintroduces the platform model, executive model, a memory model, programming model fourmodel about OpenCL of operation.In the realization of the optimization based on CSR format, we first configuration of thedevelopment environment OpenCL. It is for the SDK in head file and program the setting ofthe library. For any two rows of matrix operation is no data and logic dependent relationship,so the traditional serial algorithm can parallelization, namely in the traditional serial algorithmin the code is the cycle of outer performance can be parallel processing. And a thread dealwith one line of data, a wave processing one line of data, a thread and a wave of compromiseone line of data processing method three methods. With a thread in a line of data processingmethod, then we found load balancing and access memory problems of continuity, with awave treatment after a line of data to solve. For most of the line in the zero elements havenumber are far larger than64matrix, we found a wave processing one line of data can’t playvery well performance, with a thread processing after a line of data and a wave processing oneline of data of compromise solution method.Finally, to this paper summarizes briefly and looking forward to the future of heterogeneous the workbench SpMV optimization of.

Keywords/Search Tags:

OpenCL, GPU parallel computing, Sparse matrix vector multiplication, optimization

PDF Full Text Request

Related items

1	Parallel Algorithms And Architectures For Matrix Computations On FPGA
2	Research And Application Of Parallel Sparse Diagonal Matrix-vector Multiplication Algorithm On GPU
3	Sparse Matrix Vector Multiplication Based On CPU And GPU
4	The Research Of Hybrid Schedulling Model Based On Sparse Matrix And Parallel Algorithm
5	High Efficient Matrix Operations On Vector-SIMDE DSPs
6	Optimization And Realization For Sparse Matrix-Vector Multiplication On FPGA
7	Research On Sparse Matrix-Vector Multiplication And Convex Hull Algorithm Based On GPU
8	Design And Implementation Of GPU-based Sparse Matrix Operation Optimization System
9	Design And Optimization Of Parallel Algorithm Based On MIC Many-core Architecture
10	Study Of Parallel Computing System For Model Predictive Control