OpenCL Accelerated Deep Convolutional Neural Networks Inference And Performance Model

Posted on:2020-04-11

Degree:Master

Type:Thesis

Country:China

Candidate:X Y Qian

Full Text:PDF

GTID:2428330602451875

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

In recent years,with the development of deep convolutional neural network algorithms,more and more complex networks have emerged.While improving the performance of algorithms in many aspects,it also brings new requirements and challenges to the application of the algorithms.At the same time,with the development of hardware,many heterogeneous computing devices,such as CPU,GPU,FPGA and MIC,have emerged.Accelerating the deep convolutional neural network inference phase on different hardware has become a research hotspot.Designing a cross-platform parallel deep convolutional neural network inference algorithm brings great convenience to the algorithm application,and provides a basis for the deep convolutional neural network to adapt to different devices in the application process,which is of great significance in practical applications.At the same time,because the hardware produced by different manufacturers differs greatly in terms of architecture,performance,and power consumption according to different application scenarios,the application of algorithm also faced with a selection problem when applying to the appropriate hardware according to actual requirements.In view of the above problems,the main research contents of this paper are as follows:In order to study cross-platform parallel deep convolutional neural network inference algorithm,a parallel deep convolutional neural network inference algorithm based on Open CL is proposed.We analyze the parallelism of traditional convolution and depth-wise separable convolution.We design and implemented parallel kernel code by Open CL,and implement parallel matrix multiplication by combining with cl BLAS to accelerate the traditional convolution and depth-wise separable convolution.After experimental comparison,the proposed parallel depth-wise separable convolution has better performance than Caffe and diagonal reconstruction way.Then,we design the Open CL parallel kernel code for other operations in the deep convolutional neural network inference phase,implement the Mobilenet v1 network and the residual network.And further improve performance through kernel fusion and increasing global workloads.Finally,compare with Caffe GPU,the acceleration ratio of the two networks on AMD Radeon Vega Frontier GPU is 40.16 and 1.67 times respectively,14.95 and 1.11 times respectively on NVIDIA GTX 1070 GPU,which verifies the portability of Open CL code.In order to guide the selection of hardware in the process of deep neural network algorithm inference application,we propose a deep convolutional neural network inference performance model based on multi-layer perception,to predict the inference time of deep neural network algorithm on different hardware platforms and guide the selection of hardware.We study the factors that affect the performance of the inference phase of deep convolutional neural networks on different hardware platforms,including the impact of the network's own structure and the hardware and software configuration used.We use different influencing factors as input features and determine the range of values for each input feature.The running time of each operator on NVIDIA and AMD GPU with different feature values are obtained by using Caffe software.And train multi-layer perception for a single hardware platform and multiple hardware platforms,respectively.Use multi-layer perception to predict the running time of each operator on different hardware,and get the total running time of the network by combining the predicted time of all operations.Finally,the average relative error of predicting inference time of VGG16 network in different batch size and different hardware is 6.32%,which verifies the validity of the performance model.

Keywords/Search Tags:

OpenCL, Deep Convolutional Neural Network Algorithm, GPU, Multi-Layer Perception, Performance Model

PDF Full Text Request

Related items

1	Performance Model For Parallel Convolutional Neural Network Based On OpenCL
2	The Research Of Multi-layer Network Performance Management Model Based On Users’ Perception
3	Research On Acceleration Of Convolutional Neural Networks On FPGA Based On OpenCL
4	Deep Convolution Algorithm Optimization And Hardware Acceleration
5	Design And Implementation Of Deep Convolutional Neural Networks Acceleration System Based On Heterogeneous Processor
6	Research On 3D Model Retrieval Method Based On Convolutional Neural Network
7	Research And Application Of Deep Convolutional Neural Network Model Based On Multi-region Features
8	Research On Convolutional Neural Network Based Algorithm For Person Re-identification
9	Research On Feature Extraction Algorithm Of 3D Model Based On Deep Learning
10	Research On Multi-layer Neural Network Model Updating Facing Network Traffic