Research On Software And Hardware Acceleration Of Acc-YOLOv4 Object Detection Algorithm

Posted on:2022-04-10

Degree:Master

Type:Thesis

Country:China

Candidate:C Y Zhang

Full Text:PDF

GTID:2518306569497584

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the development of deep neural network technology,the application of deep learning technology appears in more and more fields.In the field of object detection,object detection algorithms based on deep neural network have replaced the traditional algorithm and become the mainstream,and the accuracy and efficiency of object detection are constantly improving.However,with the continuous improvement of computer performance and neural network technology,the neural network model used in object detection algorithm has become more and more complex,and the amount of calculations has increased.Although the detection accuracy of the commonly used object detection algorithms is constantly improved,the use of them in embedded devices is limited by the scale of network and the amount of calculation.At the same time,to use convolution network on the embedded device,a accelerated convolution layer algorithm on the field programmable gate array(FPGA)device is designed.Aiming at the problem of too many parameters of deep neural network model used in object detection algorithm and large amount of computation in run-time,a model compression acceleration scheme is designed for single class object detection task and multi class object detection task based on yolov4 algorithm.For single class target detection task,sparse training,channel pruning for model parameter compression are designed.Knowledge distillation training to improve the model accuracy after compression are designed too.These method can achieve large-scale compression of model parameters under the premise of ensuring that the detection accuracy loss does not exceed a certain range.For the multi class target detection task,a feature extraction network based on deep separable convolution is designed to replace the original feature extraction network.At the cost of certain detection accuracy,the model parameters and calculation are reduced by 68% of original.For the two compressed models,the model data is quantized to 16 bit floating-point numbers.It can reduces the space occupied by model parameters by half,and prepares for model hardware acceleration.Aiming at the problem that convolution neural network has a large amount of computation and runs slowly on embedded devices,a convolution neural network structure and deep separable convolution network component modules suitable for FPGA are designed.The object detection model optimized by compression is deployed on FPGA devices.Convolution calculation expansion,parameter reuse,output buffer and other methods are used to design the target.And finally a object detection algorithm that can run efficiently on the FPGA device is realized.According to the properties of FPGA equipment,convolution and deep separable convolution of hardware version is designed.The object detection model optimized by compression is deployed on FPGA devices,and a fast post-processing acceleration algorithm is proposed by using convolution calculation expansion,parameter reuse,output buffering and other methods combined with quantization method.The model convolution network acceleration scheme is designed,and an acceleration design scheme for deep separable convolution bottleneck layer structure is proposed.And finally two object detection algorithms that can run efficiently on the FPGA device is realized.Through experimental verification,the best model pruning scheme is found.And the effect of knowledge distillation is verified.The compression scheme of the whole model is selected.The effect of separable convolution network is verified by comparative experiments.The acc-yolov4 algorithm is deployed on the PYNQ board.The experimental results show that the algorithm model and hardware acceleration scheme are effective and practical.

Keywords/Search Tags:

object detection, FPGA, model compression, hardware acceleration

PDF Full Text Request

Related items

1	Research And Implementation Of Object Detection Acceleration Method Based On FPGA
2	Acceleration And Implementation Of Object Detection Algorithm Based On FPGA
3	High Performance Artificial Intelligence Computing With Algorithm-hardware Co-design
4	FPGA Implementation Of Object Detection Based On Deep Learning
5	Hardware Accelerated SoC Design For Object Detection Based On RISC-V CPU
6	An Implementation Of FPGA Algorithm Base On FPGA
7	Research On The Compression And Hardware Acceleration Based On Convolutional Neural Network
8	Design Of YOLOv3-Tiny Algorithm Based On FPGA
9	Simplification Of Deep Models:Storage Compression And Computational Acceleration
10	Researches On Acceleration Method Of Neural Network Inference For Object Detection