Font Size: a A A

A Convolutional Neural Network Accelerating Circuit Design And FPGA Implementation

Posted on:2020-08-23Degree:MasterType:Thesis
Country:ChinaCandidate:B F LiuFull Text:PDF
GTID:2428330626450802Subject:Integrated circuit engineering
Abstract/Summary:PDF Full Text Request
Convolutional neural networks(CNNs)have achieved great success in image processing by modeling the behavior of optic nerves in living creatures,and are widely used in image classification,machine vision,pattern recognition and other fields.As a computationally intensive algorithm,CNNs have specific computation pattern that makes it difficult to implement efficient CNN applications based on general-purpose processors.As a common hardware acceleration method,FPGA can map complex algorithms to its internal configurable hardware resources to achieve parallel computation,providing a new idea for the deployment of convolutional neural networks on embedded devices.Based on the in-depth analysis of the CNN's computing model,this paper proposes a Zynq-based CNNs acceleration system based on the software and hardware codesign methodology.The acceleration system is mainly composed of a configurable hardware accelerator based on Xilinx Artix-7 FPGA and a software processing system based on ARM Cortex-A9 CPU.The main work of this thesis includes:(1)A configurable hardware acceleration circuit was designed and implemented based on high-level synthesis technology.For its computing engine,parallel computation and pipeline optimization were used to realize computational acceleration.For its on-chip cache system,a memory partition strategy was adopted to match communication bandwidth and computation throughput.For its control logic,a global register list was designed to save parameters and control the whole accelerator.(2)The acceleration circuit was integrated on an ARMcentered SOC system using Vivado IDE,and a DMA+ AXI4-Stream based communication scheme was adopted to implement data communication between PS side and PL side.(3)A software processing system used for fast deployment of pretrained Caffe models on our accelerator was designed based on Pynq framework,and a parameterized user interface was provided by encapsulating the underlying DMA driver.In this paper,a typical CNN model used for handwritten digital recognition was selected to test and verify this accelerator on Xilinx Pynq-Z1 evaluation board.The experimental results show that the computation speed of the accelerator running the handwritten digit recognition network at the working frequency of 100 MHz can reach 22.65 FPS,which makes the accelerator achieve at least 25.9 times speed up compared with ARM Cortex-A9 CPU working at 650 MHz.The average power of the accelerator is only 1.59 W.In summary,the accelerator designed in this paper achieved a good acceleration effect for CNN applications compared with CPU,and the power consumption keeps at a low level.In addition,the accelerator is well configurable and enables rapid deployment of CNN applications on embedded devices or mobile terminals.
Keywords/Search Tags:Hardware Accelerator, Convolutional Neural Networks, FPGA, SOC, Artificial Intelligence
PDF Full Text Request
Related items