Font Size: a A A

Research On Software And Hardware Acceleration Of Acc-YOLOv4 Object Detection Algorithm

Posted on:2022-04-10Degree:MasterType:Thesis
Country:ChinaCandidate:C Y ZhangFull Text:PDF
GTID:2518306569497584Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of deep neural network technology,the application of deep learning technology appears in more and more fields.In the field of object detection,object detection algorithms based on deep neural network have replaced the traditional algorithm and become the mainstream,and the accuracy and efficiency of object detection are constantly improving.However,with the continuous improvement of computer performance and neural network technology,the neural network model used in object detection algorithm has become more and more complex,and the amount of calculations has increased.Although the detection accuracy of the commonly used object detection algorithms is constantly improved,the use of them in embedded devices is limited by the scale of network and the amount of calculation.At the same time,to use convolution network on the embedded device,a accelerated convolution layer algorithm on the field programmable gate array(FPGA)device is designed.Aiming at the problem of too many parameters of deep neural network model used in object detection algorithm and large amount of computation in run-time,a model compression acceleration scheme is designed for single class object detection task and multi class object detection task based on yolov4 algorithm.For single class target detection task,sparse training,channel pruning for model parameter compression are designed.Knowledge distillation training to improve the model accuracy after compression are designed too.These method can achieve large-scale compression of model parameters under the premise of ensuring that the detection accuracy loss does not exceed a certain range.For the multi class target detection task,a feature extraction network based on deep separable convolution is designed to replace the original feature extraction network.At the cost of certain detection accuracy,the model parameters and calculation are reduced by 68% of original.For the two compressed models,the model data is quantized to 16 bit floating-point numbers.It can reduces the space occupied by model parameters by half,and prepares for model hardware acceleration.Aiming at the problem that convolution neural network has a large amount of computation and runs slowly on embedded devices,a convolution neural network structure and deep separable convolution network component modules suitable for FPGA are designed.The object detection model optimized by compression is deployed on FPGA devices.Convolution calculation expansion,parameter reuse,output buffer and other methods are used to design the target.And finally a object detection algorithm that can run efficiently on the FPGA device is realized.According to the properties of FPGA equipment,convolution and deep separable convolution of hardware version is designed.The object detection model optimized by compression is deployed on FPGA devices,and a fast post-processing acceleration algorithm is proposed by using convolution calculation expansion,parameter reuse,output buffering and other methods combined with quantization method.The model convolution network acceleration scheme is designed,and an acceleration design scheme for deep separable convolution bottleneck layer structure is proposed.And finally two object detection algorithms that can run efficiently on the FPGA device is realized.Through experimental verification,the best model pruning scheme is found.And the effect of knowledge distillation is verified.The compression scheme of the whole model is selected.The effect of separable convolution network is verified by comparative experiments.The acc-yolov4 algorithm is deployed on the PYNQ board.The experimental results show that the algorithm model and hardware acceleration scheme are effective and practical.
Keywords/Search Tags:object detection, FPGA, model compression, hardware acceleration
PDF Full Text Request
Related items