Implementation Of Convolutional Neural Network Based On All Programmable SOC

Posted on:2018-03-29

Degree:Master

Type:Thesis

Country:China

Candidate:L Zhu

Full Text:PDF

GTID:2428330518958657

Subject:Internet of Things works

Abstract/Summary:

PDF Full Text Request

In recent years,Convolutional Neural Network(CNN)based methods have achieved great success in a large number of applications and have been widely employed for image recognition applications because of their high accuracy.With the Internet-of-Things and today's tremendous amount of devices able to capture pictures and videos,there is a non-negligible market for embedded systems which demands capabilities of high-accuracy and real-time object recognition,such as auto-piloted car and robots.However,CNN-based methods are computational-intensive and resource-consuming,and thus are hard to be integrated into embedded systems such as smartphones,smart glasses,and robots.FPGA is one of the most promising platforms for accelerating CNN because it has advantages of high performance,reconfigurability,high energy efficiency,and fast development round,etc.Xilinx Zynq-7000 All Programmable System-on-Chip(SoC)consists of a dual-core ARM Cortex-A9 MPCore based Processing System(PS)and an Artix-7 FPGA as programmable logic(PL).The system offers the flexibility and scalability of an FPGA,while providing performance,power,and ease of use.In this paper,we go deeper with the Zynq platform on accelerating CNNs and propose a CNN accelerator design for embedded systems.The paper studies the following:(1)Structural features,the design flow of Zynq-7000 All Programmable System on Chip;(2)the intra-layer parallelism and the implementation method of the parallelism;(3)How to design the accelerator architecture according to those types of parallelism,and explore the design space in order to find out the optimal solution;(4)the RTL design of the CNN accelerator;(5)Implementing CNN through hardware/software co-design on Zynq platform.At the end of this paper,the experimental comparison is carried out.Firstly,we compare our design with an equivalent implementation on dual-core ARM Cortex-A9 and general-purpose computers in terms of performance and energy efficiency.Results show that the system with hardware acceleration achieves 48 images/Joule,which is 8�,16 � and 9.6 � higher than dual-core ARM Cortex-A9,the desktop and the laptop.Then we compared with other high-performance CNN accelerator,our design didn't meet higher requirements,but it has a great deal of price advantage.According to the comparison results,this design can implement the CNN on hardware platform at low cost,and can achieve high performance per watt,so we can meet the needs of embedded system,low energy consumption and low cost.

Keywords/Search Tags:

Convolutional Neural Network, All Programmable SOC, Hardware Acceleration, Hardware/Software Co-design, Parallelism

PDF Full Text Request

Related items

1	The Research And Implementation Of Convolutional Neural Network Based On FPGA
2	Co-design And Implementation Of Hardware/Software Of Convolutional Neural Network Based On FPGA
3	Acceleration System Design And Implement For Convolutional Neural Network Based On SOC FPGA
4	Research And Design Of Convolutional Neural Network Accelerator Based On Multi-FPGA Co-acceleration
5	Hardware Accelerator Design Of Convolutional Neural Networks For Low Power And High Performance
6	Design And Implementation Of Convolutional Neural Network Hardware Accelerator
7	Model Compression And Hardware Acceleration Of Convolutional Neural Networks
8	ZYNQ-Based Reconfigurable Convolutional Neural Network Accelerator
9	Research And Implementation Of FPGA Accelerated Convolutional Neural Network Training
10	Research On Hardware Acceleration Of 3D Convolutional Neural Network Algorithm Based On DSP