Font Size: a A A

Compilation Optimization And Hardware Acceleration Of Object Detection Algorithm Based On Regional Proposal Network

Posted on:2023-12-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y Z HanFull Text:PDF
GTID:2558306845990859Subject:Electronic information
Abstract/Summary:PDF Full Text Request
The region proposal network algorithm is a kind of deep neural network(DNN),but its inherent high computational and storage costs bring challenges to the deployment of the algorithm on edge hardware.Field-Programmable Gate Array(FPGA)has the advantages of high throughput and low latency,and more and more developers choose FPGA as the platform for algorithm implementation.With the advancement of the idea of FPGA design automation,the use of high-level synthesis(High Level Synthesis,HLS)tools to develop FPGA applications has gradually become popular.HLS can automatically convert high-level languages such as C and C++ into Register Transfer Level(RTL)circuit code,speeding up FPGA development efficiency.However,there are many problems with using FPGA high-level synthesis to accelerate neural networks.On the one hand,the algorithm itself is not combined with the hardware accelerator scenario,resulting in a large gap between the theoretical performance of the algorithm and the actual hardware acceleration effect.On the other hand,the scheduling strategy of high-level synthesis based on static compilation has defects such as imperceptible implicit data fan-out and redundant synchronization logic,so that the generated circuit performance cannot meet the expected indicators.In response to the above two problems,this paper carries out corresponding optimization work from the two directions of target detection algorithm and high-level synthesis(HLS)compiler.The main research results are as follows:(1)At the algorithm optimization level,this paper selects the regional proposal network target detection algorithm Faster R-CNN as the acceleration object,and optimizes it for FPGA hardware deployment.First,the backbone network is lightweight and optimized,the bypass and residual structure are maintained in the model training phase,and the equivalent transformation is performed in the model inference phase,and only the 3×3 convolution with high hardware acceleration efficiency is retained.Based on the receptive field expansion coding strategy,the original multi-input and multi-output feature pyramid is optimized into a single-input and multi-output structure to reduce the amount of computation.In addition,the model is 8-bit specific point quantization,which reduces the resource consumption of FPGA hardware deployment.(2)Aiming at the problems of high fan-out,complex synchronization logic and FPGA circuit layout and routing performance of the parallel convolution circuit generated by HLS compiler,this paper proposes an optimization scheme in the form of plug-ins,which is expressed in the intermediate representation of the compiler(Intermediate Representation,IR)level,compile the custom-written optimization process(Opt Pass)into a dynamic library plug-in,insert the HLS compiler framework to automatically analyze and convert the input code,and solve the implicit high fan-out problem of the HLS compiler.Synchronization issues with redundancy.(3)Finally,the paper implements hardware deployment on Xilinx ZCU102 board.The experimental results show that after lightweight optimization of the region candidate network target detection algorithm,the amount of computation is reduced by2.9 times,the amount of parameters is reduced by 5.5 times,the execution efficiency is improved by 37.2%,and the accuracy is 75.1% on the Pascal VOC dataset.At the same time,at the HLS compiler level,the maximum fan-out value(including data broadcast fan-out and control logic fan-out)of the circuit generated by the compiler optimization in this article is reduced by 72.6% on average,and the maximum frequency(Fmax)of the FPGA is increased by 200 Mhz to 245 Mhz,an increase of 22.5%.
Keywords/Search Tags:Object detection, High level synthesis, Compiler, FPGA, Neural network accelerator
PDF Full Text Request
Related items