Font Size: a A A

OpenCL Accelerated Deep Convolutional Neural Networks Inference And Performance Model

Posted on:2020-04-11Degree:MasterType:Thesis
Country:ChinaCandidate:X Y QianFull Text:PDF
GTID:2428330602451875Subject:Engineering
Abstract/Summary:PDF Full Text Request
In recent years,with the development of deep convolutional neural network algorithms,more and more complex networks have emerged.While improving the performance of algorithms in many aspects,it also brings new requirements and challenges to the application of the algorithms.At the same time,with the development of hardware,many heterogeneous computing devices,such as CPU,GPU,FPGA and MIC,have emerged.Accelerating the deep convolutional neural network inference phase on different hardware has become a research hotspot.Designing a cross-platform parallel deep convolutional neural network inference algorithm brings great convenience to the algorithm application,and provides a basis for the deep convolutional neural network to adapt to different devices in the application process,which is of great significance in practical applications.At the same time,because the hardware produced by different manufacturers differs greatly in terms of architecture,performance,and power consumption according to different application scenarios,the application of algorithm also faced with a selection problem when applying to the appropriate hardware according to actual requirements.In view of the above problems,the main research contents of this paper are as follows:In order to study cross-platform parallel deep convolutional neural network inference algorithm,a parallel deep convolutional neural network inference algorithm based on Open CL is proposed.We analyze the parallelism of traditional convolution and depth-wise separable convolution.We design and implemented parallel kernel code by Open CL,and implement parallel matrix multiplication by combining with cl BLAS to accelerate the traditional convolution and depth-wise separable convolution.After experimental comparison,the proposed parallel depth-wise separable convolution has better performance than Caffe and diagonal reconstruction way.Then,we design the Open CL parallel kernel code for other operations in the deep convolutional neural network inference phase,implement the Mobilenet v1 network and the residual network.And further improve performance through kernel fusion and increasing global workloads.Finally,compare with Caffe GPU,the acceleration ratio of the two networks on AMD Radeon Vega Frontier GPU is 40.16 and 1.67 times respectively,14.95 and 1.11 times respectively on NVIDIA GTX 1070 GPU,which verifies the portability of Open CL code.In order to guide the selection of hardware in the process of deep neural network algorithm inference application,we propose a deep convolutional neural network inference performance model based on multi-layer perception,to predict the inference time of deep neural network algorithm on different hardware platforms and guide the selection of hardware.We study the factors that affect the performance of the inference phase of deep convolutional neural networks on different hardware platforms,including the impact of the network's own structure and the hardware and software configuration used.We use different influencing factors as input features and determine the range of values for each input feature.The running time of each operator on NVIDIA and AMD GPU with different feature values are obtained by using Caffe software.And train multi-layer perception for a single hardware platform and multiple hardware platforms,respectively.Use multi-layer perception to predict the running time of each operator on different hardware,and get the total running time of the network by combining the predicted time of all operations.Finally,the average relative error of predicting inference time of VGG16 network in different batch size and different hardware is 6.32%,which verifies the validity of the performance model.
Keywords/Search Tags:OpenCL, Deep Convolutional Neural Network Algorithm, GPU, Multi-Layer Perception, Performance Model
PDF Full Text Request
Related items