Font Size: a A A

Research On Acceleration Method Of Deep Convolutional Neural Network Based On Heterogeneous Computing Platform

Posted on:2022-09-26Degree:MasterType:Thesis
Country:ChinaCandidate:Z X WangFull Text:PDF
GTID:2518306563477264Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Algorithms based on Deep Convolutional Neural Networks(DCNN)have a vital position in the field of computer vision.Compared with traditional algorithms,they have higher performance for tasks such as image classification,object detection,and instance segmentation,which make them the main research direction in academic and industrial area in recent years.However,due to the inherent high computational load and high parameter properties of DCNN,the realization of high throughput and low latency reasoning operations still faces many challenges for scenarios with stricter power consumption and storage constraints.This paper proposed a software and hardware co-design method for a heterogeneous computing platform composed of Field-Programmable Gate Array(FPGA)and Central Processing Unit(CPU).The main contributions of this paper are as follows:(1)At the hardware design level,this paper designed a neural network accelerator based on a CPU+FPGA heterogeneous computing platform.The accelerator is based on the idea of asynchronously-computing convolution kernel,and successfully extended the ABM-Sparse algorithm to the CNN-based object detection field.The accelerator can flexibly switch heterogeneous working modes according to different detection scenarios such as high throughput and low latency.(2)At the level of algorithm design,this paper designed an end-to-end CNN optimization engine based on the Roofline model.The engine contains a newly proposed pruning algorithm,which re-examines the previous work from the perspective of how to maximize the computational efficiency of FPGA.This algorithm is based on the classic Roofline model in computer architecture and can ensure that pruning at the algorithm level can bring higher hardware performance gains during deployment.At the same time,the engine has improved and integrated the traditional CNN model compression algorithm,which can realize end-to-end CNN model compression based on the idea of software and hardware collaborative design.(3)This paper also designed a set of automated hardware design space exploration engine.The end-to-end automation process from full-precision CNN model to FPGA hardware deployment can be realized,and an FPGA demonstration platform that can perform real-time acceleration of YOLOv2 algorithm is realized.Among many DCNN-based algorithms,the You Only Look Once(YOLO)series of algorithms show a good balance between detection accuracy and detection speed.This paper used the YOLOv2 algorithm as the standard for evaluating the effectiveness of the design.Experimental results show that for scenarios with high throughput requirements,the design in this paper can achieve a throughput rate of 2.27 Tera Operations Per Second(TOPS)on Intel Arria-10 GX1150 FPGA,up to 78.7 Frames Per Second(FPS).For scenarios with low latency requirements,the proposed design can achieve a single frame inference latency of 24 milliseconds on the same platform,reaching 41.7 FPS.At the same time,the accuracy of the final deployment of the YOLOv2 model reached 74.45%mean Average Precision(m AP)on VOC2007,which is within 3 percentage points loss compared to the official full-precision model(76.8% m AP).
Keywords/Search Tags:Deep Convolutional Neural Network, unstructured pruning, FPGA, object detection
PDF Full Text Request
Related items