Convolutional neural networks have been widely used in the field of computer vision because of their excellent performance in image processing.With the popularization of GPU,as well as the rapid development of GPU programming and the explosive increase of GPU computing power,CPU-GPU deep learning heterogeneous computing system is widely used in deploying CNN model.However,it is common to perform CNN inference on the GPU in the CPU-GPU heterogeneous system,which makes the computing resources of the CPU not fully utilized.And during GPU computing,the GPU has to interact with the CPU to perform the control operations,resulting in interaction delay.CNNs are not indivisible and are composed of many different layers.In this paper,we propose to use CPU-GPU to perform CNN inference,we divide the CNN model into layers and assign each layer to their appropriate devices for execution to speed up the CNN inference process.In the third chapter of this essay,after some experiments,we find that some layers of CNN have lower computing latency on the CPU,while others have lower computing latency on the GPU.We select the convolutional layer,pooling layer,normalization layer and fully connected layer in CNN,and find that the above situation does exist.And we investigate the factors that affect the lower computation delay of each layer on CPU or GPU,such as input feature size,kernel size,input channels,output channels,batch size,etc.In addition,we design search algorithms for the convolutional layer,pooling layer and normalization layer.Through the algorithm,we can search out the points where the computing intensity of each layer is equal on the CPU and GPU.The algorithm can be applied to other hardware devices.In view of the fact that we find in the third chapter,we divide CNN into layers and assign each layer to their appropriate device for execution in the fourth chapter.When dividing each layer,we need to consider the computing delay of each layer on the CPU and GPU and the transmission delay of data between the CPU and GPU when switching devices.Therefore,we design the execution device judgment algorithm for each layer,comprehensively consider the computation delay and transmission delay.The algorithm can selecte the appropriate execution device for each layer.Then we perform the algorithm in each layer of Lenet and Alexnet,divide the CNN according to the algorithm results,and perform CPU-GPU collaborative inference.Experimental results show that compared with inference on GPU,the speed of CPU-GPU collaborative inference is increased by 13% on Lenet and 9.65% on Alexnet.For the case of CNN batch inference,when the batch sizes are different,the input and intermediate results of CNN are different,which leads to variations of appropriate execution devices for some layers.And we select the best execution device for different batch sizes inputs of CNN.We run the execution device judgment algorithm in the case of batch inference to select the appropriate execution device for each batch size of the CNN and construct the execution device network structure.We performe batch inference for Lenet and Alexnet,select the appropriate execution device network structure for their different batch sizes to optimise the speed of inference. |