Font Size: a A A

Zynq-based Convolutional Neural Network Embedded Acceleration System Design

Posted on:2020-03-24Degree:MasterType:Thesis
Country:ChinaCandidate:X KuangFull Text:PDF
GTID:2438330626953234Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Convolutional Neural Networks(CNN),as one of the representative algorithms of deep learning,has been widely used in image classification,target recognition,speech recognition and so on.CNN requires a huge amount of computation,and the performance of the traditional embedded system is rather limited,thus it is difficult to meet the real-time requirements in applications such as automatic driving.Therefore,it is urgent to design a new inference acceleration system to improve its performance.As a new heterogeneous computing platform for CPU+FPGA,Zynq is one of the most promising platforms for accelerating CNN.Based on Zynq,this paper implements an embedded inference acceleration system that can adapt to multiple CNN models.Firstly,the paper summarizes the development of neural network theory,and separates four basic operators from three classical CNN models.Secondly,the scheme of software and hardware collaborative design for CNN inference based on Zynq is proposed,and the fixed-point quantization is studied.Thirdly,the method of hardware acceleration for each operator on FPGA is studied and the corresponding IP core is designed.The structural characteristics of FPGA are studied in depth around data multiplexing and parallelism exploration.Additionally,the designed IP core is used to build an embedded CNN inference acceleration system on the Zynq platform,and the related driver design and software development are completed.Finally,the system is used to verify and test the CNN models LeNet-5,AlexNet and VGG-16.The results illustrate that the designed inference acceleration system can fit for CNN models with convolution kernels of different sizes.On the ZedBoard platform,the performance of the system for CNN models LeNet-5,AlexNet and VGG-16 are 0.08 GOP/s,8.4 GOP/s and 32.6 GOP/s,respectively.For the VGG-16 model with the largest amount of computation,the comparison test results show that the designed acceleration system is 32.1 times speed and 503 times efficiency of CPU,and the precision loss is still less than 3%.
Keywords/Search Tags:CNN, Accelerator, Quantization, Winograd, Zynq, FPGA
PDF Full Text Request
Related items