CPU-GPU Cooperation Method For Accelerating Convolutional Neural Network Inference

Posted on:2024-08-30

Degree:Master

Type:Thesis

Country:China

Candidate:Y Wang

Full Text:PDF

GTID:2568307172988359

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Convolutional neural networks have been widely used in the field of computer vision because of their excellent performance in image processing.With the popularization of GPU,as well as the rapid development of GPU programming and the explosive increase of GPU computing power,CPU-GPU deep learning heterogeneous computing system is widely used in deploying CNN model.However,it is common to perform CNN inference on the GPU in the CPU-GPU heterogeneous system,which makes the computing resources of the CPU not fully utilized.And during GPU computing,the GPU has to interact with the CPU to perform the control operations,resulting in interaction delay.CNNs are not indivisible and are composed of many different layers.In this paper,we propose to use CPU-GPU to perform CNN inference,we divide the CNN model into layers and assign each layer to their appropriate devices for execution to speed up the CNN inference process.In the third chapter of this essay,after some experiments,we find that some layers of CNN have lower computing latency on the CPU,while others have lower computing latency on the GPU.We select the convolutional layer,pooling layer,normalization layer and fully connected layer in CNN,and find that the above situation does exist.And we investigate the factors that affect the lower computation delay of each layer on CPU or GPU,such as input feature size,kernel size,input channels,output channels,batch size,etc.In addition,we design search algorithms for the convolutional layer,pooling layer and normalization layer.Through the algorithm,we can search out the points where the computing intensity of each layer is equal on the CPU and GPU.The algorithm can be applied to other hardware devices.In view of the fact that we find in the third chapter,we divide CNN into layers and assign each layer to their appropriate device for execution in the fourth chapter.When dividing each layer,we need to consider the computing delay of each layer on the CPU and GPU and the transmission delay of data between the CPU and GPU when switching devices.Therefore,we design the execution device judgment algorithm for each layer,comprehensively consider the computation delay and transmission delay.The algorithm can selecte the appropriate execution device for each layer.Then we perform the algorithm in each layer of Lenet and Alexnet,divide the CNN according to the algorithm results,and perform CPU-GPU collaborative inference.Experimental results show that compared with inference on GPU,the speed of CPU-GPU collaborative inference is increased by 13% on Lenet and 9.65% on Alexnet.For the case of CNN batch inference,when the batch sizes are different,the input and intermediate results of CNN are different,which leads to variations of appropriate execution devices for some layers.And we select the best execution device for different batch sizes inputs of CNN.We run the execution device judgment algorithm in the case of batch inference to select the appropriate execution device for each batch size of the CNN and construct the execution device network structure.We performe batch inference for Lenet and Alexnet,select the appropriate execution device network structure for their different batch sizes to optimise the speed of inference.

Keywords/Search Tags:

CPU-GPU, Convolutional neural network, CNN inference acceleration, Algorithm, Batch process

PDF Full Text Request

Related items

1	Convolutional Neural Network Compression And Accelerate Forward Inference Technology Research
2	Research And Implementation Of Integrated Method For Training And Inference Of Convolutional Neural Network
3	Researches On Inference Acceleration Of Convolutional Neural Networks For Object Detection
4	Research On Monitoring Method For Batch Process Based On Neural Network And PCA Model
5	Research On An Inference Acceleration Circuit For Low-bit Spiking Neural Networks
6	Convolutional Neural Network Model Compression And Inference Acceleration Based On Look Up Table
7	Research On Hardware Acceleration Of 3D Convolutional Neural Network Algorithm Based On DSP
8	Design Of Convolutional Neural Network Acceleration Based On FPGA
9	Research On Convolutional Neural Network Acceleration
10	Research On CNN Network Acceleration For Image Classification Based On FPGA