Design And Implementation Of Deep Convolutional Neural Networks Acceleration System Based On Heterogeneous Processor

Posted on:2019-01-10

Degree:Master

Type:Thesis

Country:China

Candidate:D K Jiang

Full Text:PDF

GTID:2428330545972237

Subject:Electronic and communication engineering

Abstract/Summary:

PDF Full Text Request

Deep Convolutional Neural Network based deep learning algorithms shows great advantages over traditional schemes in many applications,such as image classification,video analysis and speech recognition.A DCNN works with multiple convolution layers that extract features from input data,followed by classification layers making decisions.Due to multiple convolution and fully connected layers that are compute-intensive,it is difficult to perform real-time classification with low power consumption.This thesis presents a convolutional neural network acceleration system for low cost,low power SoC-FPGA based on the OpenCL heterogeneous parallel computing framework.Firstly,we made an analysis of the convolutional neural network computational complexity and degree of parallelism.Then,under the heterogeneous parallel computing framework of OpenCL,we designed the Convolution,Pooling,LRN and Data Mover Kernels.OpenCL-based design methodology is proposed and a hardware architecture of deeply pipelined kernels are proposed.The cascaded kernel pipeline can execute a serial of basic DCNNs operations without the need of storing interlayer data back to global memory.It significantly reduces the bandwidth requirement.The final design was implemented on a Cyclone-V SoC-FPGA.In order to verify the universality of the deep convolutional neural network acceleration system proposed in this thesis,two different depths of DCNN models,AlexNet and VGG-16,were choosen to perform two application experiments:object classification and face recognition.We had achieved an average classification time of 120 ms and a system power dissipation of 2.1W.Results show that our scheme achieves up to 170� and 4� speedup with similar power consumptions compared with state-of-the-art software accelerators on mobile CPU and GPU,respectively.To the best of our knowledge,this work presents the first study on OpenCL-based DCNN accelerator targeting low-cost low-power SoC-FPGAs.To compare with other FPGA-based designs,we recompiled the proposed design for Stratix-V A7 FPGA and measured the performance on the DE5-Net board,the average classification time is 10.5 ms.It is clear that our design improves the DCNN runtime by 4.3� speedup with a similar cost on DSP resource and power consumption.

Keywords/Search Tags:

Deep convolutional neural network, OpenCL, Low-cost, Low-power, SoC-FPGA, Heterogeneous computing

PDF Full Text Request

Related items

1	Deep Convolution Algorithm Optimization And Hardware Acceleration
2	Research And Implementation Of Heterogeneous Computing Based On FPGA
3	Research On Parallel Acceleration Architecture Convolutional Neural Network Based On FPGA
4	Application Research Of Convolutional Neural Network Based On Heterogeneous Computing Systems
5	The Research And Implement Of Video Image Recognition Based On Heterogeneous Computing Platform
6	Research On Acceleration Of Convolutional Neural Networks On FPGA Based On OpenCL
7	Research On Acceleration Method Of Deep Convolutional Neural Network Based On Heterogeneous Computing Platform
8	Research Of FPGA Heterogeneous Computing Method Based On OpenCL
9	Design And Research Of Deep Learning Heterogeneous Computing System Based On FPGA
10	OpenCL Accelerated Deep Convolutional Neural Networks Inference And Performance Model