Font Size: a A A

Optimizing Sparse Matrix-vector Multi Based On OpenCL

Posted on:2013-12-23Degree:MasterType:Thesis
Country:ChinaCandidate:J Q ZhaoFull Text:PDF
GTID:2248330371985578Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In the past few years we have seen in the mass GPU parallel computing in an importantrole. From the unit area and the power consumption of the unit to see, the computing power ofGPU is much higher than the CPU. Although the past in the large-scale parallel operation onthe GPU to the programming is an important challenge, but now manufacturers for the user toprovide a more efficient, intuitive development platform, such as CUDA, OpenCL otherprogramming tools that allow more developers more easily more efficient in the GPUplatform programming. OpenCL has become an important current GPGPU solution.Sparse matrix vector multiplication is mathematics and engineering used in class one ofnumerical algorithm, and a lot of numerical algorithm and can be converted to matrixcalculations, such as image processing and engineering science of information, solving linearequations, fast Fourier transform, the optimization. Solve the optimization problem of thesparse matrix vector multiplication can improve the performance of engineering sciences.This paper first with GPU computing as the background, the development process andthe GPU to some basic concepts do some introduction. Second, expounds the structure of theGPU series of AMD, starting from the hardware, through understanding the structure of thehardware platform for optimization do some preparation, and OpenCL architecture fromintroduces the platform model, executive model, a memory model, programming model fourmodel about OpenCL of operation.In the realization of the optimization based on CSR format, we first configuration of thedevelopment environment OpenCL. It is for the SDK in head file and program the setting ofthe library. For any two rows of matrix operation is no data and logic dependent relationship,so the traditional serial algorithm can parallelization, namely in the traditional serial algorithmin the code is the cycle of outer performance can be parallel processing. And a thread dealwith one line of data, a wave processing one line of data, a thread and a wave of compromiseone line of data processing method three methods. With a thread in a line of data processingmethod, then we found load balancing and access memory problems of continuity, with awave treatment after a line of data to solve. For most of the line in the zero elements havenumber are far larger than64matrix, we found a wave processing one line of data can’t playvery well performance, with a thread processing after a line of data and a wave processing oneline of data of compromise solution method.Finally, to this paper summarizes briefly and looking forward to the future of heterogeneous the workbench SpMV optimization of.
Keywords/Search Tags:OpenCL, GPU parallel computing, Sparse matrix vector multiplication, optimization
PDF Full Text Request
Related items