Implementation With FPGA And Compression Of Faster R-CNN Object Detection Network Algorithm

Posted on:2022-01-01

Degree:Master

Type:Thesis

Country:China

Candidate:S X Wu

Full Text:PDF

GTID:2518306563960929

Subject:Electronics and Communications Engineering

Abstract/Summary:

PDF Full Text Request

Deep convolution neural network has made remarkable achievements in the field of image recognition because of its excellent nonlinear expression ability and feature extraction ability.There are also many outstanding works in the field of object detection algorithm based on deep convolutional neural network,Faster R-CNN with the support of region proposal network is one of the representative works,which has high detection accuracy,especially for small targets detection.However,the disadvantages of high computation limit its deployment on mobile devices or embedded devices.In this article,the structure,parameters and computation of the faster r-cnn model are analyzed in detail,then compresses the parameters and computation of the model by modules and stages,and finally designs and implements a target detection model accelerator suitable for heterogeneous computing platform.The main contributions of this article are as follows:(1)A multi-stage joint model compression method is proposed.In the first stage,the sparsity of the backbone network is separated from the whole model sparsity process,which improves the sparsity quality of the backbone network and reduces the sparsity difficulty of the whole model.In the second stage,a light-weight feature pyramid network is used to replace the standard feature pyramid network,which avoids the problem that the standard pyramid network will introduce a lot of computation when processing highresolution feature images.In the third stage,the online quantization method is used to quantize the weight and activation value of the model by 8 bits.(2)A hardware accelerator circuit is implemented based on ABM-Sp Conv algorithm(Accumulate-Before-Multiply Sparse Convolution)On the basis of the hardware accelerator,data parallel and task parallel are realized by multiple multiplexing methods.In order to alleviate the pressure of physical bandwidth and simplify the circuit controller,a weight coding scheme is designed and implemented;In order to solve the problem of data access conflict in parallel computing,a single write multi read port buffer is designed and implemented.(3)A software hardware co-design method is proposed.Firstly,the structure and parameters of the faster r-cnn model are statically analyzed,based on which the software model is optimized and some hardware design parameters are determined;then the theoretical performance of the hardware accelerator is mathematically modeled and analyzed;finally,the design space is explored.When the loss of map value on Pascal VOC data set is less than 2%,the model compression method used in this paper achieves 38 times of the fast r-cnn model size and12.8 times of the calculation,and the final detection accuracy is 80.3%.we use Intel Arria10 GX 1150 FPGA chip to realize the accelerator circuit.The highest throughput is214 gop / s,and the detection speed is 14.9fps.

Keywords/Search Tags:

Convolutional neural network, Object detection, Heterogeneous computing, Prune, Quantization

PDF Full Text Request

Related items

1	Research On Quantification Pruning And Method Of Convolutional Neural Network
2	Research Of Object Detection Algorithm Based On Improved SSD Deep Learning Network
3	Research On Acceleration Method Of Deep Convolutional Neural Network Based On Heterogeneous Computing Platform
4	Deep Convolutional Neural Networks Compression Based On Sparsity And Quantization
5	Application Research Of Convolutional Neural Network Based On Heterogeneous Computing Systems
6	Research On Acceleration Method Of Object Detection With Convolution Neural Network In Mobile Sense
7	Research On Single Color Image Object Detection Method Based On Convolutional Neural Network
8	Design And Implementation Of Deep Convolutional Neural Networks Acceleration System Based On Heterogeneous Processor
9	Research On Dynamic Quantization Algorithm Of Convolutional Neural Networks And Its Parallel Computing Structure
10	Research On Heterogeneous Reconfigurable Dataflow Accelerator For Big Data Applications