Font Size: a A A

Acceleration And Optimization Of Deep Convolutional Neural Networks Based On FPGA

Posted on:2022-02-26Degree:MasterType:Thesis
Country:ChinaCandidate:J LiFull Text:PDF
GTID:2518306509456084Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
Deep convolutional neural networks(DCNN)have become one of the key algorithms for digital image classification in deep learning.Because DCNN can learn highly representative image features from enough training data.Nevertheless,the computational complexity of DCNN is much greater than that of classical algorithms.However,due to resource,energy efficiency and real-time constraints,many DCNN applications based on CPU or GPU platforms often cannot meet the requirements of high real-time occasion.Therefore,people are paying more and more attention to the implementation of DCNN on embedded systems,which can improve the calculation speed and real-time performance of DCNN.Field Programmable Gate Array(FPGA)has high performance and programmability.In order to reduce the computation latency and energy consumption of DCNN,many people choose to use FPGA to accelerate DCNN.DCNN layers and memory transmission mainly affect the running time of DCNN on FPGA.However,many previous researchers only accelerate one of them and few hardware accelerators can accelerate all of them.This paper has designed a highperformance DCNN hardware accelerator that can optimize the convolutional layer,fully connected layer and memory transmission in DCNN.Specifically,this paper use a fast multiplication which is tight.We use it to improve Winograd to increase the calculation speed of the convolutional layer.At the same time,fast multiplication can also speed up the fully connected layer.For data transmission between off-chip and onchip,this paper proposes a trapezoid reusing strategy,which can minimize the memory transmission of the convolutional layer.On the Zynq XC7Z100 development board,we verified the performance of the DCNN hardware accelerator with Alex Net,VGG-16 and Res Net-50.Their throughout is 65.3GFLOP/S,505.3GFLOP/S and560.2GFLOP/S respectively.
Keywords/Search Tags:FPGA, DCNN, accelerating computation, optimizing data transmission, trapezoid reusing
PDF Full Text Request
Related items