Font Size: a A A

Research And Design Of YOLO V2 Neural Network Accelerator Based On FPGA

Posted on:2021-06-28Degree:MasterType:Thesis
Country:ChinaCandidate:F H BiFull Text:PDF
GTID:2518306197955589Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the development of deep learning model,the target detection based on deep learning has made remarkable achievements.However,most of the existing target detection methods cannot give consideration to both accuracy and speed,and it's can't achieve real-time target detection requirements.The YOLO v2 algorithm with simple structure and fast speed has created a new idea for real-time target detection.Now most of the convolutional neural network system running on the GPU,CPU,such as general processor,and automatic driving,unmanned aerial vehicles,industrial robots,and other areas of the real-time target detection has a stringent requirements on power,time delay,the FPGA because of its low power consumption,the characteristics of high parallelism,better able to meet demand,and compared with ASIC,has the advantages of reconfigurable FPGA based YOLO accelerator v2 neural network become a hot research.In order to design a YOLO v2 neural network accelerator based on FPGA,this paper starts from the soft and hard aspects.In terms of software,this paper firstly analyzes the characteristics of FPGA and the defects of YOLO v2 algorithm hardware deployment.Secondly,the batch normalization and Leaky ReLU were integrated into the convolution operation to simplify the hardware resource consumption.Finally,8bit dynamic fixed-point quantization was carried out for the whole YOLO v2 network,giving consideration to both speed and accuracy.In terms of hardware implementation,the overall design scheme with convolution module,pooling module,auxiliary operation path and memory access module as the main body is presented according to the top-down thinking.A dual state machine is added to the convolution module to increase the system parallelism and flow performance,and a multiplicative unit design based on weight sharing is proposed to further explore the efficiency of DSP resources.Finally,the memory access module and DMA module are designed for the internal and external communication of the accelerator,which increase the communication efficiency of the system.Based on the proposed architecture,accelerator in Vivado 2018.2 implementation under the environment of the simulation,to prove the validity of the experiment,and then burn Xilinx KU115 development version,the experimental measurement system mAP was 80.2% on average,compared to run on the GPU or CPU end of unadjusted YOLO v2 have fallen by about 2%,but the FPS is 37.88 on average,a GPU has certain advantages,and the power consumption of only about 20 w,about 1/12 of the GPU,compared with the similar research results,both speed and accuracy of this design is better,With better comprehensive performance,it is suitable to be applied in projects with high time delay and energy consumption requirements,such as autonomous driving,unmanned aerial vehicle and industrial robot.
Keywords/Search Tags:Convolutional Neural Network, YOLO v2, Hardware Acceleration, FPGA
PDF Full Text Request
Related items