Font Size: a A A

Design And Implementation Of Convolutional Neural Network Accelerator Based On ZYNQ

Posted on:2022-11-01Degree:MasterType:Thesis
Country:ChinaCandidate:D F WangFull Text:PDF
GTID:2518306752953169Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Convolutional Neural Network(CNN)has achieved great success in the field of artificial intelligence,and is widely used in object detection,machine vision,image recognition,etc.General purpose processors cannot efficiently process CNN's increasingly large amount of calculation because it is a serial execution process.In contrast,field programmable gate array(FPGA)has the advantages of low power consumption and high parallelism,which is very suitable for accelerating convolutional neural networks.In order to further adapt to the high-speed and low-power requirements of engineering applications for data processing,based on ZYNQ-7000 Series platform and combined with the advantages of FPGA and advanced risc machine(ARM)processor,this paper designs a heterogeneous system on chip(So C)that can completely accelerate the forward inference process.Firstly,based on the principle and structural characteristics of CNN,this paper focuses on the internal parallelism of CNN,and uses small convolution check network local optimization.Then the system is divided into functions,FPGA is responsible for data calculation,and ARM is responsible for the control of system flow.Using 16 bit fixed-point number for network operation and data storage,combined with convolution module and pooling module,a scheme is designed in which convolution output channel is completely parallel to convolution window and output characteristic graph is partially parallel.Next,the accelerator core is efficiently implemented through high-level integrated language,and different optimization strategies are adopted according to the characteristics of each sub-module using technologies such as pipeline,loop unrolling,and array segmentation.Create concurrent processes for each sub-module under pingpong operation,and replace the stagnation concept with a more flexible distributed handshake architecture,thereby improving the overall computing efficiency of the accelerator.Finally,develop C/C++ applications to control the flow of the entire embedded system.Finally,the host computer is developed based on Python language and py Qt5 library to verify the hardware accelerator system.When testing 10000 handwritten Arabic numerals at 100 MHz working frequency,the average time-consuming for calculating a single picture is only 0.94 ms,which can achieve 138 times acceleration compared with ARM and 95 times acceleration compared with general CPU,and the power consumption is only 14% of CPU.The high-speed and low-power hardware accelerator designed in this paper has a certain development prospect for CNN in the fields of image and visual processing.
Keywords/Search Tags:heterogeneous SoC, FPGA, convolutional neural network, hardware acceleration, embedded systems
PDF Full Text Request
Related items