Font Size: a A A

Research On Acceleration Scheme Of Convolutional Neural Network Based On CPU-FPGA Heterogeneous Computing

Posted on:2022-10-26Degree:MasterType:Thesis
Country:ChinaCandidate:Y T LeiFull Text:PDF
GTID:2518306737456464Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the continuous development of artificial intelligence technology,convolutional neural networks have been widely used to solve various complex problems,and have attracted a lot of attention in academia and industry.Driven by the internet of things technology,convolutional neural networks have also begun to be applied to various embedded and mobile devices to implement various intelligent functions.In these application scenarios,FPGAs are particularly suitable for accelerating the calculation of convolutional neural networks because of their high performance,low latency,low power consumption,and short development cycle.In order to solve the limitations brought by the large number of parameters and calculations of the convolutional neural network,and to meet the diversified convolutional neural network structure,the researchers designed a deep learning processing unit(DPU)on the FPGA for the convolutional neural network universal acceleration.The current DPU acceleration scheme has the problems of low DPU utilization and DPU scheduling efficiency.This article proposes the MCDS acceleration scheme and the DPU Plus acceleration scheme for these two problems.The main research contents of this paper are as follows:1.For the hardware implementation of convolutional neural network on FPGA,this article uses the DPU acceleration scheme provided by Xilinx.The DPU in this scheme is a general convolutional neural network accelerator.By cooperating with software tools,the DPU can complete the accelerated calculation of various convolutional neural network models.This paper uses this scheme to complete the hardware implementation of several commonly used convolutional neural network models,and at the same time uses these convolutional neural network models to perform intelligent tasks in intelligent transportation scenarios,and achieves a certain detection speed and accuracy.2.For the problem of low DPU utilization when executing the convolutional neural network model on the DPU,this paper proposes an MCDS acceleration scheme.This solution aims to design multiple DPU cores of different sizes on FPGAs with limited hardware resources.This paper has completed the hardware implementation of DPU cores of different sizes,and tested the DPU utilization and FPS of several commonly used convolutional neural network models on DPU cores of different sizes.Experimental data shows that compared with the DPU acceleration scheme provided by Xilinx,the MCDS acceleration scheme effectively improves the DPU utilization rate and the number of DPU cores,thereby increasing the overall throughput of the DPU and achieving the effect of accelerating calculations.3.For the problem of low DPU scheduling efficiency in the application system of the convolutional neural network model,this paper proposes a DPU Plus acceleration scheme.This solution aims to implement the DPU core and auxiliary modules on the FPGA at the same time,and allow the two modules to complete computing tasks together.The DPU Plus acceleration scheme is a general hardware design scheme,which makes full use of the high performance and flexibility of FPGA.Through the innovation of the production process,this paper has completed a hardware realization of DPU Plus,and developed the upper application system based on this hardware.Experimental data shows that compared with the DPU acceleration solution provided by Xilinx,the DPU Plus acceleration solution can effectively improve the DPU scheduling efficiency,thereby increasing the overall throughput of the application system and achieving the effect of accelerating calculations.
Keywords/Search Tags:Convolutional Neural Network, FPGA, DPU, Acceleration
PDF Full Text Request
Related items