Acceleration And Optimization Of Deep Convolutional Neural Networks Based On FPGA

Posted on:2022-02-26

Degree:Master

Type:Thesis

Country:China

Candidate:J Li

Full Text:PDF

GTID:2518306509456084

Subject:Electronics and Communications Engineering

Abstract/Summary:

PDF Full Text Request

Deep convolutional neural networks(DCNN)have become one of the key algorithms for digital image classification in deep learning.Because DCNN can learn highly representative image features from enough training data.Nevertheless,the computational complexity of DCNN is much greater than that of classical algorithms.However,due to resource,energy efficiency and real-time constraints,many DCNN applications based on CPU or GPU platforms often cannot meet the requirements of high real-time occasion.Therefore,people are paying more and more attention to the implementation of DCNN on embedded systems,which can improve the calculation speed and real-time performance of DCNN.Field Programmable Gate Array(FPGA)has high performance and programmability.In order to reduce the computation latency and energy consumption of DCNN,many people choose to use FPGA to accelerate DCNN.DCNN layers and memory transmission mainly affect the running time of DCNN on FPGA.However,many previous researchers only accelerate one of them and few hardware accelerators can accelerate all of them.This paper has designed a highperformance DCNN hardware accelerator that can optimize the convolutional layer,fully connected layer and memory transmission in DCNN.Specifically,this paper use a fast multiplication which is tight.We use it to improve Winograd to increase the calculation speed of the convolutional layer.At the same time,fast multiplication can also speed up the fully connected layer.For data transmission between off-chip and onchip,this paper proposes a trapezoid reusing strategy,which can minimize the memory transmission of the convolutional layer.On the Zynq XC7Z100 development board,we verified the performance of the DCNN hardware accelerator with Alex Net,VGG-16 and Res Net-50.Their throughout is 65.3GFLOP/S,505.3GFLOP/S and560.2GFLOP/S respectively.

Keywords/Search Tags:

FPGA, DCNN, accelerating computation, optimizing data transmission, trapezoid reusing

PDF Full Text Request

Related items

1	Research On Accelerating The Computation Of ?-OTDR Sensing System Based On FPGA
2	Research And Design Of A Key Technology For Accelerating Convolution Computation Based On FPGA
3	Research On Special Heterogeneous Accelerator Of Convolutional Neural Network Based On FPGA
4	Research And Implementation Of Optimizing Technique On Spark Executing Plan Based On Reusing Of Calculation
5	Research On Parallel Accelerating Algorithm Based On OpenCL And Realization On FPGA
6	A Cost-and Power-Efficient Programmable DCNN Processor
7	Research And Implementation Of Data Reusing Strategy In Column-store Data Warehouse
8	Research On Influential Factors Of Researcher's Data Reusing Intention
9	The Research Of SoC Test Data Compression Method Based On Partial Reusing And Statistic Coding
10	Optimizing And Accelerating Application Of Deep Learning In Image Recognition Based On FPGA