Font Size: a A A

Research On Rapid Design Method Of Convolution Neural Network Accelerator Based On FPGA

Posted on:2021-11-23Degree:MasterType:Thesis
Country:ChinaCandidate:Q GuoFull Text:PDF
GTID:2518306503974609Subject:IC Engineering
Abstract/Summary:PDF Full Text Request
Convolutional Neural Network is widely used in various fields due to its characteristics of representation learning.In recent years,with the increasing number and size of convolutional neural networks,the rapid design methods of FPGA-based convolutional neural network accelerators have become an important research direction.However,the existing designs are mainly optimized for throughput,but the overall latency is usually long,which cannot meet the needs of real-time applications.To solve this problem,a rapid design method of convolutional neural network accelerator based on fine-grained pipeline architecture is proposed in this thesis.The entire design strategy is divided into two parts: front-end design and back-end design.The front-end design mainly performs hardware-oriented preprocessing of the original network through the model parser and model optimization.Meanwhile,the back-end design mainly selects parameters for pre-designed hardware templates through design space exploration and makes up the complete hardware code to deploy on FPGA.In order to optimize the overall latency,this thesis first uses parameter quantization and layer fusion to reduce the calculation and storage overhead.Then,based on the characteristics of the fine-grained pipeline architecture,a column-based convolution calculation optimization scheme is proposed,and the calculation template of the convolution layer and the on-chip buffer template are designed.Finally,a design space exploration strategy based on Roofline model is proposed.By rationally allocating the hardware resources of each stage of the pipeline,the design point with the shortest overall latency is obtained.The design space exploration strategy proposed in this thesis is verified on Alex Net,VGG-16,Cifar10-fcn and YOLOv2-tiny,and hardware implementation is implemented for YOLOv2-tiny.When clock frequency is200 MHz,the throughput is 464.5GOPs,the energy efficiency is45.3GOPs/W,and the overall latency is 27.78 ms.Compared with similar designs,the overall latency of this work is significantly reduced,and the throughput and energy efficiency are improved.
Keywords/Search Tags:CNN, FPGA, rapid design method, fine-grained pipeline, low latency
PDF Full Text Request
Related items