Font Size: a A A

Research And Implementation Of Convolutional Neural Network Acceleration Method Based On FPGA

Posted on:2022-07-14Degree:MasterType:Thesis
Country:ChinaCandidate:H J GongFull Text:PDF
GTID:2518306332492974Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In the field of aerospace,remote sensing image processing has been advancing the development of intelligence,and artificial intelligence algorithms such as Convolutional Neural Networks(CNN)are gradually replacing traditional algorithms.In order to adapt to more complex task scenarios in the future,convolutional neural network algorithms are developing rapidly.On-board systems need to achieve rapid deployment of convolutional neural networks,and deployed convolutional neural networks need to achieve rapid optimization iterations according to target requirements.At present,the hardware platform widely used by on-board systems is Field Programmable Gate Array(FPGA).When facing complex algorithms such as convolutional neural networks,the traditional way of using hardware description language to develop FPGA is difficult and long cycle.Use high-level abstract design tools to develop,although the development cycle is shortened,but the performance is not good.Therefore,the research focus of this paper is to improve the performance of the network on the basis of ensuring the rapid deployment and iteration of the convolutional neural network.The main work of this paper includes the following aspects:(1)Analyze and summarize the current situation of network compression optimization and FPGA-based convolutional neural network acceleration,establish the research direction and content for subsequent network optimization and accelerator hardware design,and introduce related theories of convolutional neural network and FPGA,and according to the needs of the project,gave a detailed introduction to the Res Net18 convolutional neural network to pave the way for the subsequent design.By comparing the pros and cons of different development tools,the development tools of this article are determined to achieve the goal of rapid network deployment and iteration.(2)Taking ResNet18 as the research object,the optimization strategy of convolutional neural network is studied.First,the network structure of Res Net18 is analyzed,and the network is optimized in two ways.The complexity of the network model is reduced by the fusion of the convolutional layer and the batch normalization(BN)layer,and then the input data and weight of the network is quantified from the original floating-point value to a signed 8-bit fixed-point value.The network before and after optimization was tested on the UCMerceed?Land Use data set.The results show that the accuracy of the network's classification of images has basically not decreased,and the network size has dropped to one-fourth of the original.(3)Based on the convolution calculation process,a convolution parallel acceleration architecture is proposed.In order to reduce the consumption of FPGA on-chip cache,design the input data and weight data tiling method.In order to increase data multiplexing,use loop exchange to increase the multiplexing of input data on the channel,and design linear cache to increase the input data on the two-dimensional plane.In convolution parallelism,design input and output channels and parallel calculation of convolution kernel,use multiplication array and addition tree to realize hardware design,and propose data parallelism exploration algorithm to optimize the utilization of hardware resources.The results show that by using a variety of convolution optimization strategies,the maximum operating frequency of the convolutional neural network acceleration system designed in this paper can reach225 MHz,the average performance of processing multiple convolutional layers is45.13 GOPS,and the power consumption is only 4.268 W,the energy efficiency ratio is10.57 GOPS/W.Compared with the other literatures,the design of this article has a certain improvement in DSP efficiency and energy efficiency ratio.(4)Implement and verify the CNN accelerator based on High Level Synthesis(HLS).Based on the hardware design of the previous article,the code design of the acceleration unit is realized by using HLS,which is integrated into the Register Transfer Level(RTL)implementation,and then packaged as an IP core,the functional correctness of the IP core is verified through behavior simulation.Use Vivado to integrate the IP core with ZYNQ and place and route,analyze and compare the timing,power consumption,resource utilization,and performance of the generated hardware acceleration system to facilitate the search for accelerator performance bottlenecks.The accelerator in this article took about two months from design to implementation.Compared with traditional development methods,this article has greatly shortened the development cycle,and has a good reference for the rapid deployment and iteration of convolutional neural networks on FPGA.
Keywords/Search Tags:CNN, FPGA, HLS, Hardware Acceleration
PDF Full Text Request
Related items