Research On Acceleration Of Convolutional Neural Networks On FPGA Based On OpenCL

Posted on:2019-04-07

Degree:Master

Type:Thesis

Country:China

Candidate:H Sun

Full Text:PDF

GTID:2428330563993356

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the continuous development of Deep Learning,convolutional neural networks are widely used in practical scenarios such as speech recognition,face detection,and natural language processing.Especially in the field of computer vision,convolutional neural networks have better robustness and universal applicability than traditional machine learning algorithms.However,as the practical application scenarios become more and more complex and the application fields become more and more extensive,the structure of the convolutional neural network becomes more and more complex,and the requirements for the computing power of the hardware are also increasing.Although CPUs and GPUs that are widely used to run convolutional neural networks can meet certain application requirements,CPUs and GPUs are less suitable for applications that require high power consumption,such as mobile terminals and large servers.FPGAs have the advantages of configurability and low power consumption.Running convolutional neural networks on FPGAs has broad application prospects.The research content of this article is how to run on the FPGA and accelerate the forward process of the convolutional neural network.This paper first analyzes the computational complexity and space complexity of the convolutional neural network based on its computational methods and structural characteristics.Using VGG16 as an example,the Roofline Model is used to analyze the network performance bottlenecks and identify potential optimization strategies.The analysis results show that the computational bottleneck in the convolutional neural network is mainly concentrated on the calculation of the convolution layer,and the bottleneck of the storage bandwidth is mainly at the full-connection layer.Then,according to the analysis results,an optimization scheme is proposed.For the calculation of the convolution layer,the Winograd minimum filter algorithm is used to reduce the computational complexity;for the full connection layer,the batch calculation method is used to reduce the bandwidth usage;for the calculation of the pooling layer,Ping-Pong Buffer structure to increase the degree of computational parallelism to speed up calculations.At the same time,combined with the computational characteristics of convolutional neural network and the hardware structure of FPGA,a pipeline and parallel computing structure is designed to accelerate the calculation of convolutional neural network.Decomposing the calculation of each layer into multiple small calculation units in the pipeline structure so that the calculations between the units can be executed in the pipeline;in the parallel calculation structure,the input features and the convolution kernel are recombined to make multiple volumes The operations between nucleuses can be performed in parallel.This paper uses OpenCL to implement the above optimization scheme.Several groups of control experiments were conducted using VGG16 and Alex Net as examples.The experimental results show that compared with the CPU and GPU,the computational processing speed of the convolutional neural network on the FPGA lies between the two,but the FPGA has the lowest energy consumption and the energy consumption efficiency is 56.8 times that of the CPU and 3.4 times that of the GPU.At the same time as compared with other FPGA implementation methods,the method in this paper achieves the maximum system throughput.Finally,in order to combine theoretical research with practical application,this paper designs an FPGA-based face recognition system.

Keywords/Search Tags:

FPGA, convolutional neural network, OpenCL, Winograd minimum filter algorithm

PDF Full Text Request

Related items

1	Optimization And Implementation For FPGA-based Deep Learning Accelerator
2	Research On Parallel Acceleration Architecture Convolutional Neural Network Based On FPGA
3	Object Detection Algorithm Acceleration Based On OpenCL
4	Design And Optimization Of Convolution Array Accelerator Based On FPGA
5	Design Of Neural Network Accelerator In Multiple Convolutional Modes
6	OpenCL Accelerated Deep Convolutional Neural Networks Inference And Performance Model
7	Zynq-based Convolutional Neural Network Embedded Acceleration System Design
8	Research On Hardware Acceleration Of 3D Convolutional Neural Network Algorithm Based On DSP
9	Deep Learning Model Compression And Acceleration For Image Processing Based On FPGA
10	Deep Convolution Algorithm Optimization And Hardware Acceleration