Font Size: a A A

Design And Implementation Of GPU-based Sparse Matrix Operation Optimization System

Posted on:2020-10-17Degree:MasterType:Thesis
Country:ChinaCandidate:Y QinFull Text:PDF
GTID:2428330611499667Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the increasing use of artificial intelligence,artificial intelligence algorithms represented by convolution or deep neural networks have made great applications.In order to improve the accuracy of prediction,the scale of neural networks has increased dramatically,and the huge-scale neural network has introduced a huge amount of computation.Although the network sparseization technology can greatly reduce the amount of computation,it will cause high sparsity of the neurons and the connection weight matrix,and the connection weight matrix will also have irregularities.The GPU+CPU heterogeneous computing architecture can achieve high-efficiency calculation of large-scale neural networks,but the irregular structure of the sparse connection weight matrix will make the calculation of different neurons significantly different,and the existence of thread idling will reduce the performance of the neural network.Even offset the performance gains brought by network sparse technology.This paper designs and implements a GPU-based sparse matrix operation optimization system.Since the essence of full-join operations is sparse matrix vector multiplication,sparse matrix-oriented neural network operation is accelerated by optimizing sparse matrix vector multiplication and convolution operations.Firstly,the sparse matrix compressed row partitioning problem is modeled as an iterative multi-way integer partitioning problem,and the sparse matrix vector multiplication of integer partitioning is proposed.The difference of computational quantities between threads is balanced as much as possible,and the situation of thread idling is alleviated.Secondly,it proposes the sparse matrix vector multiplication of the merged storage to realize the combined memory access and data multiplexing,improve the efficiency of memory access,and realize the parallelism between rows and columns as much as possible.Again,Using Shared memory,aligned access and sparse storage format to design Self-defined convolution function to optimize traditional convolution operations.Finally,the traditional convolution operation is transformed into matrix multiplication,thesparse storage format is used to optimize self-defined function,and the convolution operation is optimized by optimizing the sparse matrix multiplication.It also tests the influence of sparsity and storage format on convolution operations,and provides suggestions for the selection of sparse matrix storage formats in neural networks.In view of the above research,the overall architecture of the system is designed.The above optimization algorithm is used as the core function of the system.The functional modules of the sparse matrix operation optimization system are designed and implemented,and the various functional modules of the system are tested.Using the public dataset to test the above optimization method,the experiment proves that the integer partitioning strategy can reduce the time of the full join operation.The sparse matrix vector multiplication speed of the merged storage format can be increased by an average of 8 times,and Self-defined convolution function and self-defined function achieve the best performance of convolution operation for high sparse matrix.
Keywords/Search Tags:GPU, Sparse Matrix Vector Multiplication, Convolution Operation, Partitioning Strategy, Storage Format
PDF Full Text Request
Related items