Font Size: a A A

Research And Implementation Of FPGA Acceleration Method For Convolutional Neural Network

Posted on:2022-09-19Degree:MasterType:Thesis
Country:ChinaCandidate:T ManFull Text:PDF
GTID:2518306554952979Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Convolutional Neural Network(CNN)has become one of the most successful algorithms in the field of computer vision and has been widely used in many fields.However,as the network level of CNN becomes deeper and the model structure becomes more complex,the computational load of CNN models increases dramatically,resulting in a large amount of resource and energy consumption.On the other hand,traditional general purpose processors(CPU)cannot meet the real-time requirements because of their serial computing methods.Graphics Processing Unit(GPU)is not suitable for mobile embedded platforms because of its high power consumption.Field Programmable Gates Array(FPGA)is flexible,programmable,low power consumption and short development cycle.It has unique advantages over CPU and GPU in terms of performance power ratio.Parallel accelerated computing of CNN models using limited hardware resources of the field-bus has become a hot issue in the industry.The core of this is the design of a common acceleration module architecture.Firstly,the overall architecture of CNN and the calculation methods of different models are analyzed.Potential modules and processes for parallel computation of convolution neural networks and possible optimization measures are identified.On this basis,the hardware-software collaboration scheme is designed and divided,and the overall system framework of ARM+FPGA is designed and used.In hardware design,input and output units,convolution units based on multiplicative and accumulative arrays,pooling units,activation function units and reordering units are designed using HLS(High-level synthesis)technology,and the idea of pipelining is used to integrate each calculation module.In terms of optimization measures,use optimization measures such as compression quantization,pipelining,cyclic optimization,ping-pong cache and parameter data cache to reduce memory,speed up the calculation process.The Roofline performance evaluation model is used to optimize the hardware structure to solve the problem of limited storage and computing resources on the FPGA.Finally,the accelerated calculation of the forward FPGA for CNN network was achieve.Secondly,the parallel acceleration design based on FPGA proposed in this paper is validated and analyzed.Using the PYNQ-Z2 development board of Xilinx Zynq-7000 series,we build a comprehensive verification platform for convolution neural network accelerators.The Le Net-5 model and YOLOv2 model,which are typical lightweight convolution neural networks in image classification and target detection,are validated and compared.For Le Net-5 networks,performance tests are performed using MNIST datasets and CIFAR-10 datasets.Taking MNIST datasets as an example,the results show that the power consumption of the FPGA is 3.6% of the i5-8300 H CPU and the energy efficiency is 20.5 times that of the i5-8300 H CPU.Compared with GTX 1060 GPU,power consumption is 2.1% of GTX 1060 GPU and energy efficiency is 12.3 times of GTX 1060 GPU.For YOLOv2 network,the computational performance of 26.23GOP/s is achieved experimentally.The performance of the FPGA is 5 times that of i5-8300 H and 87 times that of the ARM Cortex A9.In terms of energy efficiency,the FPGA is 95 times higher than i5-8300 H,145 times higher than ARM,and 6.8 times higher than GTX 1060.
Keywords/Search Tags:Computer vision, CNN, FPGA, Hardware accelerating, HLS
PDF Full Text Request
Related items