Font Size: a A A

Design And Implementation Of Deep Convolutional Neural Networks Acceleration System Based On Heterogeneous Processor

Posted on:2019-01-10Degree:MasterType:Thesis
Country:ChinaCandidate:D K JiangFull Text:PDF
GTID:2428330545972237Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
Deep Convolutional Neural Network based deep learning algorithms shows great advantages over traditional schemes in many applications,such as image classification,video analysis and speech recognition.A DCNN works with multiple convolution layers that extract features from input data,followed by classification layers making decisions.Due to multiple convolution and fully connected layers that are compute-intensive,it is difficult to perform real-time classification with low power consumption.This thesis presents a convolutional neural network acceleration system for low cost,low power SoC-FPGA based on the OpenCL heterogeneous parallel computing framework.Firstly,we made an analysis of the convolutional neural network computational complexity and degree of parallelism.Then,under the heterogeneous parallel computing framework of OpenCL,we designed the Convolution,Pooling,LRN and Data Mover Kernels.OpenCL-based design methodology is proposed and a hardware architecture of deeply pipelined kernels are proposed.The cascaded kernel pipeline can execute a serial of basic DCNNs operations without the need of storing interlayer data back to global memory.It significantly reduces the bandwidth requirement.The final design was implemented on a Cyclone-V SoC-FPGA.In order to verify the universality of the deep convolutional neural network acceleration system proposed in this thesis,two different depths of DCNN models,AlexNet and VGG-16,were choosen to perform two application experiments:object classification and face recognition.We had achieved an average classification time of 120 ms and a system power dissipation of 2.1W.Results show that our scheme achieves up to 170× and 4× speedup with similar power consumptions compared with state-of-the-art software accelerators on mobile CPU and GPU,respectively.To the best of our knowledge,this work presents the first study on OpenCL-based DCNN accelerator targeting low-cost low-power SoC-FPGAs.To compare with other FPGA-based designs,we recompiled the proposed design for Stratix-V A7 FPGA and measured the performance on the DE5-Net board,the average classification time is 10.5 ms.It is clear that our design improves the DCNN runtime by 4.3× speedup with a similar cost on DSP resource and power consumption.
Keywords/Search Tags:Deep convolutional neural network, OpenCL, Low-cost, Low-power, SoC-FPGA, Heterogeneous computing
PDF Full Text Request
Related items