Zynq-based Convolutional Neural Network Embedded Acceleration System Design

Posted on:2020-03-24

Degree:Master

Type:Thesis

Country:China

Candidate:X Kuang

Full Text:PDF

GTID:2438330626953234

Subject:Communication and Information System

Abstract/Summary:

PDF Full Text Request

Convolutional Neural Networks(CNN),as one of the representative algorithms of deep learning,has been widely used in image classification,target recognition,speech recognition and so on.CNN requires a huge amount of computation,and the performance of the traditional embedded system is rather limited,thus it is difficult to meet the real-time requirements in applications such as automatic driving.Therefore,it is urgent to design a new inference acceleration system to improve its performance.As a new heterogeneous computing platform for CPU+FPGA,Zynq is one of the most promising platforms for accelerating CNN.Based on Zynq,this paper implements an embedded inference acceleration system that can adapt to multiple CNN models.Firstly,the paper summarizes the development of neural network theory,and separates four basic operators from three classical CNN models.Secondly,the scheme of software and hardware collaborative design for CNN inference based on Zynq is proposed,and the fixed-point quantization is studied.Thirdly,the method of hardware acceleration for each operator on FPGA is studied and the corresponding IP core is designed.The structural characteristics of FPGA are studied in depth around data multiplexing and parallelism exploration.Additionally,the designed IP core is used to build an embedded CNN inference acceleration system on the Zynq platform,and the related driver design and software development are completed.Finally,the system is used to verify and test the CNN models LeNet-5,AlexNet and VGG-16.The results illustrate that the designed inference acceleration system can fit for CNN models with convolution kernels of different sizes.On the ZedBoard platform,the performance of the system for CNN models LeNet-5,AlexNet and VGG-16 are 0.08 GOP/s,8.4 GOP/s and 32.6 GOP/s,respectively.For the VGG-16 model with the largest amount of computation,the comparison test results show that the designed acceleration system is 32.1 times speed and 503 times efficiency of CPU,and the precision loss is still less than 3%.

Keywords/Search Tags:

CNN, Accelerator, Quantization, Winograd, Zynq, FPGA

PDF Full Text Request

Related items

1	Research And Implementation Of SSD Target Detection Technology Based On FPGA Accelerator
2	Zynq-based Accelerator Design For Deep Convolutional Neural Networks
3	Design And Optimization Of Convolution Array Accelerator Based On FPGA
4	Optimization And Implementation For FPGA-based Deep Learning Accelerator
5	A Point Cloud Neural Network Accelerator Based On Zynq
6	Design And Implementation Of Convolutional Neural Network Accelerator Based On Affine Quantization
7	Design Of Neural Network Accelerator In Multiple Convolutional Modes
8	Research And Implementation Of End-to-Side Inference Accelerator For Convolutional Neural Network Based On ZYNQ
9	Design And Optimization Of Configurable Hardware Accelerator For LSTM Neural Network
10	The Algorithm Design And FPGA Verification Of Face Detection And Recogniton Based On 8Bit Quantization Neural Network