Research And Design Of YOLO V2 Neural Network Accelerator Based On FPGA

Posted on:2021-06-28

Degree:Master

Type:Thesis

Country:China

Candidate:F H Bi

Full Text:PDF

GTID:2518306197955589

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

With the development of deep learning model,the target detection based on deep learning has made remarkable achievements.However,most of the existing target detection methods cannot give consideration to both accuracy and speed,and it's can't achieve real-time target detection requirements.The YOLO v2 algorithm with simple structure and fast speed has created a new idea for real-time target detection.Now most of the convolutional neural network system running on the GPU,CPU,such as general processor,and automatic driving,unmanned aerial vehicles,industrial robots,and other areas of the real-time target detection has a stringent requirements on power,time delay,the FPGA because of its low power consumption,the characteristics of high parallelism,better able to meet demand,and compared with ASIC,has the advantages of reconfigurable FPGA based YOLO accelerator v2 neural network become a hot research.In order to design a YOLO v2 neural network accelerator based on FPGA,this paper starts from the soft and hard aspects.In terms of software,this paper firstly analyzes the characteristics of FPGA and the defects of YOLO v2 algorithm hardware deployment.Secondly,the batch normalization and Leaky ReLU were integrated into the convolution operation to simplify the hardware resource consumption.Finally,8bit dynamic fixed-point quantization was carried out for the whole YOLO v2 network,giving consideration to both speed and accuracy.In terms of hardware implementation,the overall design scheme with convolution module,pooling module,auxiliary operation path and memory access module as the main body is presented according to the top-down thinking.A dual state machine is added to the convolution module to increase the system parallelism and flow performance,and a multiplicative unit design based on weight sharing is proposed to further explore the efficiency of DSP resources.Finally,the memory access module and DMA module are designed for the internal and external communication of the accelerator,which increase the communication efficiency of the system.Based on the proposed architecture,accelerator in Vivado 2018.2 implementation under the environment of the simulation,to prove the validity of the experiment,and then burn Xilinx KU115 development version,the experimental measurement system mAP was 80.2% on average,compared to run on the GPU or CPU end of unadjusted YOLO v2 have fallen by about 2%,but the FPS is 37.88 on average,a GPU has certain advantages,and the power consumption of only about 20 w,about 1/12 of the GPU,compared with the similar research results,both speed and accuracy of this design is better,With better comprehensive performance,it is suitable to be applied in projects with high time delay and energy consumption requirements,such as autonomous driving,unmanned aerial vehicle and industrial robot.

Keywords/Search Tags:

Convolutional Neural Network, YOLO v2, Hardware Acceleration, FPGA

PDF Full Text Request

Related items

1	Design Of YOLOv3-Tiny Algorithm Based On FPGA
2	Research On Dynamic Quantization Algorithm Of Convolutional Neural Networks And Its Parallel Computing Structure
3	Design And Implementation Of Convolutional Neural Network Acceleration Based On FPGA
4	Acceleration System Design And Implement For Convolutional Neural Network Based On SOC FPGA
5	Design Of Convolutional Neural Network Acceleration System And FPGA Verification
6	Research On CNN Network Acceleration For Image Classification Based On FPGA
7	Research On The Acceleration Of Tiny-yolo Convolution Neural Network Based On HLS
8	Research And Implementation Of Acceleration Of Binary Convolutional Neural Network Based On FPGA
9	Research On Hardware Acceleration Based On FPGA Of Convolutional Neural Network And Elliptic Curve Algorithm
10	Research On FPGA Acceleration Of Neural Network Algorithm