Font Size: a A A

Research On GPU Program Optimization Technology For Sparse Data

Posted on:2019-10-22Degree:MasterType:Thesis
Country:ChinaCandidate:G L LiFull Text:PDF
GTID:2428330548459210Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the continuous development of information technology,more and more data and tasks are needed to be processed by computers.In order to imcrease the execution speed of computer programs,more and more companies and research institutes begin to design high-performance parallel applications by combining central processing unit(CPU)and graphics processor(GPU).In recent years,GPU has been widely deployed in large computing clusters,such as data centers,high performance computing centers,etc.GPU is also integrated in many embedded devices,such as smart phones,autopilot cars,smart cameras,etc.GPU vendors,such as NVIDIA,AMD,and other GPU vendors have provided the CUDA,OpenCL programming language to enable users to easily write GPU parallel programs.Since there is a difference from those between the architecture of GPU and CPU,writing high performance GPU code often requires some experience and optimization techniques.Therefore,the programs written by non professional users often have a lot of optimization possibilities.In order to make full advantage of data sparsity in deep learning,data mining and other application scenarios,this paper makes an in-depth study of GPU program optimization technology based on sparse data.In this paper,a GPU program optimization method for sparse data is proposed,which consists of two strategies,namely,immediate replacement and sparse constant optimization.We analyze the process of optimization based on two different code levels of source code and object code,and illustrate the advantages of program optimization based on object code.We have designed a template based method of immediate number substitution,which generates sparse object code by placing template data in the source code.Using the idea of sparse constant optimization,the sparse object code optimization method based on PTX and the sparse object code optimization method based on cubin is designed respectively.In order to make full use of the advantages of two kinds of sparse object code optimization methods,a performance model is proposed in this paper,which can analyze the cost and benefit of optimization methods and guide the process of GPU program optimization.We propose a formal description method for GPU programs based on graph models,which represent GPU programs with nested directed acyclic graphs(DAG).Through graph models,we analyze the dynamic optimization opportunities existing in GPU programs,and design corresponding algorithms to calculate the hidden optimization time of each part of GPU program.On the basis of the above work,a GPU program optimization framework for sparse data is proposed in this paper.The framework transforms the GPU program into the corresponding program execution diagram,applies the program optimization method based on PTX object code for the static optimization of programs,and optimizes the program dynamically by using cubin based object code optimization method.The performance model guides the framework to use the optimization method in the process optimization process.The framework transforms the GPU program into a corresponding program execution diagram.After that,the program optimization method based on PTX object code and the program optimization method based on cubin object code are used to optimize the program static and dynamic.The performance model guides the framework to use the optimization method in the process optimization process.In the experimental part,this paper takes the LeNet-5 as an example to analyze the sparse data optimization opportunities existing in deep learning.We constructed the program execution diagram corresponding to training process and prediction process of LeNet-5,and showed how to optimize the training and prediction process in deep learning scenarios by applying the optimization framework.In order to verify the performance of the program optimization method,we selected typical convolution operations of LeNet,Alex CIFAR-10 and ResNet to test the performance of the program optimization method.The experimental results show that: The optimized convolution operation using the GPU program optimization framework in this paper has the highest performance in the dynamic optimization scenario,which is 1.6 times higher than the recognized high-performance library cuDNN in the industry.In the static optimization scenario,its performance can be as high as 6.9 times as high as cuDNN.The optimized convolution operation has 10-70× time's performance improvement compared with the sparse matrix high performance computing library cuSparse.The program optimization framework in this paper has a significant performance optimization effect and is versatile,and it can be widely used in sparse data scenarios.
Keywords/Search Tags:GPU program optimization, sparse data, deep learning
PDF Full Text Request
Related items