Research On YOLOv3 Algorithm Architecture Based On FPGA Platform

Posted on:2021-01-24

Degree:Master

Type:Thesis

Country:China

Candidate:A B Wang

Full Text:PDF

GTID:2518306050967039

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Neural network algorithms show more excellent feature extraction capabilities than traditional algorithms,so they show outstanding performance in the field of computer vision.They have become principal algorithms in image classification,target detection,and speech recognition.However,the complex structure of neural networks,large amounts of data,storage-intensive and computation-intensive features make it difficult to be widely used in practical scenarios.Although the CPU platform and GPU platform can provide a convenient development framework and sufficient computing power,the energy consumption is too high to be used in low power consumption scenarios.FPGA is an ideal platform to solve this problem due to its flexible configuration,high parallelism and low power consumption.At the same time,in the direction of object detection,YOLOv3 algorithm is an algorithm based on full convolutional network.Compared with other algorithms,it is easier to implement in hardware and has higher accuracy.Based on this background,this thesis studies the design method of YOLOv3 algorithm architecture on FPGA platform.This thesis first studies the convolution-based object detection algorithms and neural network accelerators.Then we deeply analyze the principle and network structure of YOLOv3 algorithm,combined with the calculation characteristics of FPGA to improve the algorithm,to reduce the complexity of calculation and the need for storage.For the algorithm's inference process,this thesis combines the chief neural network architecture design methods in the industry to design a reusable architecture that uses instruction driving.Under the premise of ensuring the detection accuracy,this thesis uses a dynamic fixed-point quantization method to quantify the data to adapt to the calculation method of the FPGA platform.At the same time,after analyzing the characteristics of the algorithm,this article uses the very long instruction word to design a specific instruction set,and combines the super pipeline processor to design the instruction system on the FPGA platform,which increases the parallelism of instruction execution and reduces instruction calculation Delay.In order to unify the memory management,this thesis addresses the unified addressing of on-chip storage and off-chip storage,and designs a memory management system.In addition,based on the computing characteristics of the DSP in FPGA,this thesis designs a high-speed DSP pulse array as the core computing unit,and uses the surrounding weights,feature maps and parts and modules to use the feature map weight reuse strategy in the convolution circular block strategy to perform a convolution operation.In addition,this thesis also designed peripheral computing modules such as ReLU and Upsample.Finally,this thesis uses Verilog language to implement the design scheme of YOLOv3 algorithm architecture on Xilinx VCU118 platform,and introduces the instruction generation process in detail.At the same time,in order to connect with the PC,PCIe logic is implemented on the FPGA.The quantified YOLOv3 algorithm has a detection accuracy of 51.2%on the COCO dataset,which is 3.4%lower than the original accuracy.The processing time of the image with detection size of 416�416�3 is 449.8ms,and the throughput rate reaches 146 GOPS.

Keywords/Search Tags:

FPGA, YOLOv3, Very Long Instruction Word, Quantization, DSP Pulse Array

PDF Full Text Request

Related items

1	Design And Implementation Of GCC Instruction Scheduling Algorithm Based On TMS320C6000
2	Very long instruction word architectures for digital signal processing
3	Instruction-flow Scheduling Mechanism For High-performance SIMD DSP
4	Research And Implementation Of FT64-2 Kernel Assembler
5	Optimization And Design Of Instruction Dispatch Unit In 600MHz YHFT-DX
6	The Study Of Simultaneous Multithreading In VLIW Processors
7	Research On The Hardware Acceleration For High-precision Algorithm Based-on Very Long Instruction Word Framework
8	Study On Medical Periodic Long-pulse Nd:YAG Laser
9	Research And Design Of Unified Shader With Automatic Scheduling Of Threads And VLIW
10	Fpga-based Real-time Pulse Parameter Measurement Techniques