Font Size: a A A

Research Of Scalability On FPGA-based Neural Network Accelerator

Posted on:2020-05-23Degree:MasterType:Thesis
Country:ChinaCandidate:C ChenFull Text:PDF
GTID:2428330578464117Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,deep learning algorithms represented by deep neural networks have made great breakthroughs in many computer vision tasks,such as image classification,object detection,and image quality enhancement.Due to its low power consumption and latency,the FPGA is suitable for small batch streaming applications where power consumption is severely limited.In addition,by configuring the FPGA hardware logic to customize the hardware for specific applications,it is ideal for deep learning.As a result,FPGAs have become a highly promising option.At present,there are two main restrictions on the large-scale use of FPGAs to accelerate deep learning:(1)Low development efficiency.However,as high-level integrated technologies continue to mature,FPGAs can be described in high-level languages such as C and C++,and the problem of low development efficiency is being gradually alleviated.(2)Poor scale scalability.Due to the limitation of chip resources,it is often necessary to upgrade chips or multi-chip partitioning for complex algorithms,and there is no way to flexibly deploy to different hardware.Therefore,this paper is based on the scale scalability research of FPGA-based neural network accelerators.By studying the characteristics of existing FPGA-based neural network accelerators and the reconfigurable features of FPGAs,we explore how to flexibly deploy accelerators for specific neural network algorithms to FPGAs of different sizes.The hardware design parameters of the convolutional neural network accelerator based on SIMD architecture and the effect of data multiplexing mode on scale scalability are studied and analyzed.An FPGA-based SIMD convolutional neural network accelerator with twodimensional expansion is designed and implemented.Optimization strategies such as parameter rearrangement,ping-pong buffering,and multi-channel data transmission are used to reduce the memory access and transmission delay.Improved input and output modules in the accelerator,effectively improving bandwidth utilization while reducing transmission and access latency.Taking the YOLOv2 target detection algorithm widely used in the industry as an example,the complete process of mapping the CNN model to the FPGA is described.At the same time,the performance and required resources of the accelerator are deeply analyzed and modeled,taking into account the actual transmission delay,which greatly reduces the error between the theoretical and actual delay.Evaluate and compare different scales of CNN accelerators generated by different hardware design parameters.Experimental results show that an average 30.15 GOPS performance achieved when the Zedboard with Zynq-7020 is used.It is 7.3x and 120.4x of performance and energy gains respectively when comparing with the software Darknet on an 8-core Xeon server,and 86 x and 112.9x over the software version on the dualcore ARM cortex-A9 on Zynq.And in performance,it also exceeded previous work.The relevant code has been open sourced on github.
Keywords/Search Tags:Convolutional neural network, FPGA, hardware accelerator, High-level synthesis
PDF Full Text Request
Related items