Font Size: a A A

Optimization And Implementation Of YOLOv3 Model Based On FPGA

Posted on:2021-12-20Degree:MasterType:Thesis
Country:ChinaCandidate:Y X DuFull Text:PDF
GTID:2518306050466444Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
With the innovation of artificial intelligence technology and the improvement of hardware level,target detection technology has developed rapidly in recent years,and deep learning-based detection algorithms have emerged endlessly and gradually applied to all aspects of life.However,almost all of the commonly used object detection algorithms currently have high computational complexity and need the help of high-performance computers to complete the task.Many application scenarios for target detection,such as unmanned driving,navigation guidance,and traffic supervision,require algorithms to be deployed on mobile devices,and mobile devices often do not have high-performance computing capabilities,leading to better results.The deep learning target detection algorithm is difficult to deploy on mobile devices.The research topic of this article is to carry out targeted algorithm optimization and hardware implementation based on the computing characteristics of FPGA and the advantages and disadvantages of YOLOv3 target detection algorithm to fill the gaps in mobile terminal target detection algorithms.The main research contents and results are as follows:1.Aiming at the characteristics of convolution operation,the advantages of parallel computing of FPGA are used to design a convolution multiplication array in FPGA.Uniform input and output data bit width,and call more than 2,000 DSP units at the same time for multiplication calculations,to achieve highly parallel calculations,the operation speed is 256 times the single convolution operation.2.Aiming at the problem that the convolution operation data is called too many times,a three-row shift register is used to buffer the data,so that each pixel only needs to be read once to complete all convolution operations.This design method makes the number of pixel reads is only one-ninth of the original pixel reads.3.To address the problem of too much cache data in the target detection algorithm,when using high-performance computer calculations,up to 16 G of memory resources can be used to cache all operational data,but the FPGA does not have such a large memory resource,so a special RAM interaction scheme is designed in this paper.Only use more than one hundred Mbit of RAM resources to complete the operation of the entire algorithm and output the detection results.4.The YOLOv3 algorithm includes a feature extraction part and a three-scale detection part.In this paper,a pipeline design method is used reasonably,which effectively improves the algorithm operation speed and reduces the operation time to half the original operation time5.According to the designed FPGA algorithm architecture,an algorithm optimization strategy is proposed.First,when training the YOLOv3 algorithm model,add regularization constraints to the BN layer parameters,so that more BN layer parameters change in the direction of 0,and the network is trimmed with a lower impact on accuracy.Secondly,in order to save the amount of model parameters,finite bit quantization is performed on the model parameters,and the model parameters are reduced as much as possible while sacrificing a little accuracy.Finally,referring to the existing network cutting methods,a new network cutting method that fits the hardware design of this paper is proposed.These three methods are used to optimize the model and deploy it on the FPGA,which greatly improves the operation speed of the algorithm.
Keywords/Search Tags:FPGA, Convolutional Neural Network, YOLOv3, Model optimization
PDF Full Text Request
Related items