Font Size: a A A

Implementation With FPGA And Compression Of Faster R-CNN Object Detection Network Algorithm

Posted on:2022-01-01Degree:MasterType:Thesis
Country:ChinaCandidate:S X WuFull Text:PDF
GTID:2518306563960929Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
Deep convolution neural network has made remarkable achievements in the field of image recognition because of its excellent nonlinear expression ability and feature extraction ability.There are also many outstanding works in the field of object detection algorithm based on deep convolutional neural network,Faster R-CNN with the support of region proposal network is one of the representative works,which has high detection accuracy,especially for small targets detection.However,the disadvantages of high computation limit its deployment on mobile devices or embedded devices.In this article,the structure,parameters and computation of the faster r-cnn model are analyzed in detail,then compresses the parameters and computation of the model by modules and stages,and finally designs and implements a target detection model accelerator suitable for heterogeneous computing platform.The main contributions of this article are as follows:(1)A multi-stage joint model compression method is proposed.In the first stage,the sparsity of the backbone network is separated from the whole model sparsity process,which improves the sparsity quality of the backbone network and reduces the sparsity difficulty of the whole model.In the second stage,a light-weight feature pyramid network is used to replace the standard feature pyramid network,which avoids the problem that the standard pyramid network will introduce a lot of computation when processing highresolution feature images.In the third stage,the online quantization method is used to quantize the weight and activation value of the model by 8 bits.(2)A hardware accelerator circuit is implemented based on ABM-Sp Conv algorithm(Accumulate-Before-Multiply Sparse Convolution)On the basis of the hardware accelerator,data parallel and task parallel are realized by multiple multiplexing methods.In order to alleviate the pressure of physical bandwidth and simplify the circuit controller,a weight coding scheme is designed and implemented;In order to solve the problem of data access conflict in parallel computing,a single write multi read port buffer is designed and implemented.(3)A software hardware co-design method is proposed.Firstly,the structure and parameters of the faster r-cnn model are statically analyzed,based on which the software model is optimized and some hardware design parameters are determined;then the theoretical performance of the hardware accelerator is mathematically modeled and analyzed;finally,the design space is explored.When the loss of map value on Pascal VOC data set is less than 2%,the model compression method used in this paper achieves 38 times of the fast r-cnn model size and12.8 times of the calculation,and the final detection accuracy is 80.3%.we use Intel Arria10 GX 1150 FPGA chip to realize the accelerator circuit.The highest throughput is214 gop / s,and the detection speed is 14.9fps.
Keywords/Search Tags:Convolutional neural network, Object detection, Heterogeneous computing, Prune, Quantization
PDF Full Text Request
Related items