Font Size: a A A

Design And Implementation Of Faster R-CNN Accelerator Based On Heterogeneous Processor

Posted on:2021-05-26Degree:MasterType:Thesis
Country:ChinaCandidate:F WuFull Text:PDF
GTID:2428330614970660Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
Deep learning has brought breakthroughs to machine leaning and computer vision.Due to the extremely high computational complexity of deep convolutional networks,and CPU processors are made for general computing design,they are not applicable for processing the accelerated works of convolutional computing tasks.GPU is widely used in model training stage for having a large number of parallel computing units.However,in model inference stage,where the computing tasks are mostly single data and multiinstruction stream,GPU with its feature of having high bandwidth is not applicable,and the power consumption of GPU is too high to meet the needs of embedded applications.As a programmable logic device,FPGA has the characteristics of high performance,low power consumption,low latency and reconfigurable.It is very suitable for processing convolutional neural network acceleration tasks in the terminal and the cloud,Therefore,the design of FPGA-based CNN accelerator has become a research focus.Aiming at the classic algorithm Faster R-CNN in the two-stage object detection,this paper has optimized detection algorithm and realized the design of FPGA-based object detection hardware accelerator through the collaborative design of software and hardware.Firstly,this paper proposes the optimized method for Faster R-CNN algorithm for FPGA platform,using Res Net-50 to replace VGG-16 as the backbone network,which has improved the feature extraction ability and reduced the number of parameters and calculations of CNN model at the same time.Increasing the ROI pooling size has alleviated the problem of poor detection of small objects brought by the quantized CNN model.Fusing the convolutional layer and the batch normalization layer has reduced the calculation of forward inference.Under the premise of maintaining the detection accuracy,the network model was quantified by 8 bits fixed point,and the model was effectively compressed to ease the bandwidth pressure.Secondly,this paper has completed the design of FPGA hardware acceleration architecture based on Open CL heterogeneous computing framework.This paper has designed the data transmission kernel,and acceleration kernels such as convolution,max pooling,and ROI pooling,and connect multiple cores through the Open CL channel to realize a deep pipeline design.Considering the independence of classification and regression calculation of multiple proposals after ROI pooling,this paper has reordered the data from multiple proposals and provided parallelism for the regression and classification calculation of multiple proposals.This paper completes the design space exploration process.Through multiple sets of experiments on the two variable parallelism parameters in the hardware design,the peak performance of the architecture designed in this paper on the target board is finally obtained.This paper has eventually designed and implemented an object detection acceleration system based on Faster R-CNN on the Intel Arria-10 GX1150 FPGA development board.In terms of energy efficiency,this work is 40 times that of CPU and 1.6 times that of GPU.It has stronger computing power in unit energy consumption per unit time.As for the inference performance,the work of this paper achieves the inference performance of 224×224 input resolution of 1.73 frames per second and 800×600 input resolution of 0.75 frames per second.This is sped up by approximately 51% compared with the FPGA acceleration solution of advanced Faster R-CNN,and the m AP is 72.83%.It ensures the inference performance with high detection accuracy rate.
Keywords/Search Tags:Deep Learning, Object Detection, OpenCL, FPGA, Faster R-CNN, Heterogeneous computing
PDF Full Text Request
Related items