Design And Implementation Of Convolutional Neural Network Accelerator Based On ZYNQ

Posted on:2022-11-01

Degree:Master

Type:Thesis

Country:China

Candidate:D F Wang

Full Text:PDF

GTID:2518306752953169

Subject:Master of Engineering

Abstract/Summary:

PDF Full Text Request

Convolutional Neural Network(CNN)has achieved great success in the field of artificial intelligence,and is widely used in object detection,machine vision,image recognition,etc.General purpose processors cannot efficiently process CNN's increasingly large amount of calculation because it is a serial execution process.In contrast,field programmable gate array(FPGA)has the advantages of low power consumption and high parallelism,which is very suitable for accelerating convolutional neural networks.In order to further adapt to the high-speed and low-power requirements of engineering applications for data processing,based on ZYNQ-7000 Series platform and combined with the advantages of FPGA and advanced risc machine(ARM)processor,this paper designs a heterogeneous system on chip(So C)that can completely accelerate the forward inference process.Firstly,based on the principle and structural characteristics of CNN,this paper focuses on the internal parallelism of CNN,and uses small convolution check network local optimization.Then the system is divided into functions,FPGA is responsible for data calculation,and ARM is responsible for the control of system flow.Using 16 bit fixed-point number for network operation and data storage,combined with convolution module and pooling module,a scheme is designed in which convolution output channel is completely parallel to convolution window and output characteristic graph is partially parallel.Next,the accelerator core is efficiently implemented through high-level integrated language,and different optimization strategies are adopted according to the characteristics of each sub-module using technologies such as pipeline,loop unrolling,and array segmentation.Create concurrent processes for each sub-module under pingpong operation,and replace the stagnation concept with a more flexible distributed handshake architecture,thereby improving the overall computing efficiency of the accelerator.Finally,develop C/C++ applications to control the flow of the entire embedded system.Finally,the host computer is developed based on Python language and py Qt5 library to verify the hardware accelerator system.When testing 10000 handwritten Arabic numerals at 100 MHz working frequency,the average time-consuming for calculating a single picture is only 0.94 ms,which can achieve 138 times acceleration compared with ARM and 95 times acceleration compared with general CPU,and the power consumption is only 14% of CPU.The high-speed and low-power hardware accelerator designed in this paper has a certain development prospect for CNN in the fields of image and visual processing.

Keywords/Search Tags:

heterogeneous SoC, FPGA, convolutional neural network, hardware acceleration, embedded systems

PDF Full Text Request

Related items

1	Study Of Heterogeneous Multi-core Acceleration Methods For Convolutional Neural Networks On Reconfigurable Platform
2	Design And Implementation Of Convolutional Neural Network Acceleration Based On FPGA
3	Acceleration System Design And Implement For Convolutional Neural Network Based On SOC FPGA
4	Design Of Convolutional Neural Network Acceleration System And FPGA Verification
5	Research On CNN Network Acceleration For Image Classification Based On FPGA
6	Research On Acceleration Scheme Of Convolutional Neural Network Based On CPU-FPGA Heterogeneous Computing
7	Research And Implementation Of Acceleration Of Binary Convolutional Neural Network Based On FPGA
8	Research On Hardware Acceleration Based On FPGA Of Convolutional Neural Network And Elliptic Curve Algorithm
9	Research On FPGA Acceleration Of Neural Network Algorithm
10	Research And Implementation Of Convolutional Neural Network Acceleration Method Based On FPGA