Research On Rapid Design Method Of Convolution Neural Network Accelerator Based On FPGA

Posted on:2021-11-23

Degree:Master

Type:Thesis

Country:China

Candidate:Q Guo

Full Text:PDF

GTID:2518306503974609

Subject:IC Engineering

Abstract/Summary:

PDF Full Text Request

Convolutional Neural Network is widely used in various fields due to its characteristics of representation learning.In recent years,with the increasing number and size of convolutional neural networks,the rapid design methods of FPGA-based convolutional neural network accelerators have become an important research direction.However,the existing designs are mainly optimized for throughput,but the overall latency is usually long,which cannot meet the needs of real-time applications.To solve this problem,a rapid design method of convolutional neural network accelerator based on fine-grained pipeline architecture is proposed in this thesis.The entire design strategy is divided into two parts: front-end design and back-end design.The front-end design mainly performs hardware-oriented preprocessing of the original network through the model parser and model optimization.Meanwhile,the back-end design mainly selects parameters for pre-designed hardware templates through design space exploration and makes up the complete hardware code to deploy on FPGA.In order to optimize the overall latency,this thesis first uses parameter quantization and layer fusion to reduce the calculation and storage overhead.Then,based on the characteristics of the fine-grained pipeline architecture,a column-based convolution calculation optimization scheme is proposed,and the calculation template of the convolution layer and the on-chip buffer template are designed.Finally,a design space exploration strategy based on Roofline model is proposed.By rationally allocating the hardware resources of each stage of the pipeline,the design point with the shortest overall latency is obtained.The design space exploration strategy proposed in this thesis is verified on Alex Net,VGG-16,Cifar10-fcn and YOLOv2-tiny,and hardware implementation is implemented for YOLOv2-tiny.When clock frequency is200 MHz,the throughput is 464.5GOPs,the energy efficiency is45.3GOPs/W,and the overall latency is 27.78 ms.Compared with similar designs,the overall latency of this work is significantly reduced,and the throughput and energy efficiency are improved.

Keywords/Search Tags:

CNN, FPGA, rapid design method, fine-grained pipeline, low latency

PDF Full Text Request

Related items

1	Research And Application On Fine-Grained Image Classification Based On Bilinear Model
2	Design Of Fine-grained Classification Algorithm Based On Meta-learning And FPGA Verification
3	Analysis And Research Of Key Technologies For Fine-grained Image Recognition Based On Convolutional Neural Networks
4	Fine-grained Algorithm And Architecture For Data Processing In SAR Applications
5	A Saliency-based Method For Fine-grained Image Classification
6	Research On Depth Visual Attention Method For Multi Class Target Fine-Grained Recognition
7	Fine-Grained Recognition Of Yunan Wild Bird Images Based On Deep Learning
8	Research On Fine-Grained Sentiment Analysis Method Of Online Shopping Comments
9	Research On Fine-Grained Car Recognition Based On Deep Semantic Features Enhancement
10	Study On The Method Of Fine-grained Action Recognition In Video