Design And Implementation Of GPU-based Sparse Matrix Operation Optimization System

Posted on:2020-10-17

Degree:Master

Type:Thesis

Country:China

Candidate:Y Qin

Full Text:PDF

GTID:2428330611499667

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the increasing use of artificial intelligence,artificial intelligence algorithms represented by convolution or deep neural networks have made great applications.In order to improve the accuracy of prediction,the scale of neural networks has increased dramatically,and the huge-scale neural network has introduced a huge amount of computation.Although the network sparseization technology can greatly reduce the amount of computation,it will cause high sparsity of the neurons and the connection weight matrix,and the connection weight matrix will also have irregularities.The GPU+CPU heterogeneous computing architecture can achieve high-efficiency calculation of large-scale neural networks,but the irregular structure of the sparse connection weight matrix will make the calculation of different neurons significantly different,and the existence of thread idling will reduce the performance of the neural network.Even offset the performance gains brought by network sparse technology.This paper designs and implements a GPU-based sparse matrix operation optimization system.Since the essence of full-join operations is sparse matrix vector multiplication,sparse matrix-oriented neural network operation is accelerated by optimizing sparse matrix vector multiplication and convolution operations.Firstly,the sparse matrix compressed row partitioning problem is modeled as an iterative multi-way integer partitioning problem,and the sparse matrix vector multiplication of integer partitioning is proposed.The difference of computational quantities between threads is balanced as much as possible,and the situation of thread idling is alleviated.Secondly,it proposes the sparse matrix vector multiplication of the merged storage to realize the combined memory access and data multiplexing,improve the efficiency of memory access,and realize the parallelism between rows and columns as much as possible.Again,Using Shared memory,aligned access and sparse storage format to design Self-defined convolution function to optimize traditional convolution operations.Finally,the traditional convolution operation is transformed into matrix multiplication,thesparse storage format is used to optimize self-defined function,and the convolution operation is optimized by optimizing the sparse matrix multiplication.It also tests the influence of sparsity and storage format on convolution operations,and provides suggestions for the selection of sparse matrix storage formats in neural networks.In view of the above research,the overall architecture of the system is designed.The above optimization algorithm is used as the core function of the system.The functional modules of the sparse matrix operation optimization system are designed and implemented,and the various functional modules of the system are tested.Using the public dataset to test the above optimization method,the experiment proves that the integer partitioning strategy can reduce the time of the full join operation.The sparse matrix vector multiplication speed of the merged storage format can be increased by an average of 8 times,and Self-defined convolution function and self-defined function achieve the best performance of convolution operation for high sparse matrix.

Keywords/Search Tags:

GPU, Sparse Matrix Vector Multiplication, Convolution Operation, Partitioning Strategy, Storage Format

PDF Full Text Request

Related items

1	Research And Application Of Parallel Sparse Diagonal Matrix-vector Multiplication Algorithm On GPU
2	Parallel Algorithms And Architectures For Matrix Computations On FPGA
3	Optimization And Realization For Sparse Matrix-Vector Multiplication On FPGA
4	High Efficient Matrix Operations On Vector-SIMDE DSPs
5	Research On Sparse Problem Storage And Scheduling Of High Performance Computers
6	Optimizing Sparse Matrix-vector Multi Based On OpenCL
7	Sparse Matrix Vector Multiplication Based On CPU And GPU
8	Fast space-varying convolution in stray light reduction, fast matrix vector multiplication using the sparse matrix transform, and activation detection in fMRI data analysis
9	Research On Optical Vector Matrix Operation Method Based On Complex Media
10	The Research Of Hybrid Schedulling Model Based On Sparse Matrix And Parallel Algorithm