Font Size: a A A

Research On Parallel Acceleration Design Of Convolutional Neural Network Based On FPGA

Posted on:2022-10-11Degree:MasterType:Thesis
Country:ChinaCandidate:J W ZhengFull Text:PDF
GTID:2518306602966479Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
In recent years,the field of artificial intelligence(Artificial Intelligence)has made rapid progress in continuous research and development.Convolutional neural network algorithms have become more widely used in video surveillance,machine vision,pattern recognition and other fields.However,the huge amount of calculation makes it a difficult problem to deploy convolutional neural network algorithms on embedded terminal platforms.In order to solve this problem,there are endless researches on its hardware acceleration.This article is mainly based on the realization of small resource consumption as the starting point,with the purpose of improving acceleration performance and efficiency,using FPGA development platform,through the exploration of parallel design methods,the hardware acceleration design of the typical neural network VGG-16 is researched.First of all,the characteristics of several typical convolutional neural networks are compared and analyzed,and the network calculation amount distribution is calculated,and the acceleration scheme designed with the convolutional layer as the core is determined.In the study of the convolutional layer acceleration design space,the focus is on the study of the internal parallel scheme of the convolution operation.The feasibility and complexity of the hardware implementation of the four basic parallel methods are explored,and two different parallel design schemes are selected respectively.Combine some key optimization methods in hardware acceleration design with parallel design schemes to determine the overall scheduling method of convolution operations.Secondly,using the resource parallel mode,taking the two basic ideas of sub-operation submodule design and parallel design mining as the starting point,a convolution acceleration architecture with strong versatility for typical neural networks is realized.Parallel design adopts a combination of two parallel methods,intra-layer parallel and convolution window parallel;in order to adapt to the hardware computing array,the basic format of data storage is defined,and 3d Cube data rearrangement is carried out to more conveniently retrieve data;The convolution module uses data slicing to solve the problem of shortage of storage resources,and solves the problem of repeatedly fetching data from the on-chip cache through data multiplexing.Based on the Xilinx Zynq-7030 FPGA experimental platform,it has achieved a relatively high multiplier efficiency ratio with a low degree of parallelism,and has excellent power performance.Compared with the design research of the same platform,this design has a clear advantage,the multiplier efficiency There is a 2.2 times improvement in energy efficiency and a 3.5 times performance improvement in energy efficiency,which is suitable for some application scenarios with low power consumption and low recognition speed requirements.Finally,based on the above design,a high-parallel acceleration scheme is proposed,which adopts a parallel design method of multiple input and multiple output channels,and optimizes it for the VGG-16 network structure.From the preprocessing of the first layer of convolution,From the perspectives of pooling module optimization,optimized slicing scheme,on-chip cache optimization,etc.,to improve the performance of convolutional neural network acceleration calculation.Based on the Zynq-ZCU102 platform,a significant acceleration effect has been achieved.At a clock frequency of 120 MHZ,an inference time only takes 200.74 ms,the average computing power reaches 154.13 GOPS,and the convolutional layer computing power reaches 216.99 GOPS,which is comparable in computing performance.Yu increased 16 times on the Zynq-7030 platform.In addition,this design performs well in terms of power consumption performance.The power consumption is only 5.374 W,and the energy efficiency reaches 28.68GOPS/W.Compared with the NVIDIA Jetson series GPU TX2,the energy efficiency is increased by 1.5 times;compared with the same FPGA platform,this design It has obvious advantages in terms of energy efficiency and accelerated computing power.
Keywords/Search Tags:Convolutional Neural Network, FPGA, Hardware Accelerated Design, Parallel Computing Solution
PDF Full Text Request
Related items