Font Size: a A A

Implementation Of Convolutional Neural Network Based On All Programmable SOC

Posted on:2018-03-29Degree:MasterType:Thesis
Country:ChinaCandidate:L ZhuFull Text:PDF
GTID:2428330518958657Subject:Internet of Things works
Abstract/Summary:PDF Full Text Request
In recent years,Convolutional Neural Network(CNN)based methods have achieved great success in a large number of applications and have been widely employed for image recognition applications because of their high accuracy.With the Internet-of-Things and today's tremendous amount of devices able to capture pictures and videos,there is a non-negligible market for embedded systems which demands capabilities of high-accuracy and real-time object recognition,such as auto-piloted car and robots.However,CNN-based methods are computational-intensive and resource-consuming,and thus are hard to be integrated into embedded systems such as smartphones,smart glasses,and robots.FPGA is one of the most promising platforms for accelerating CNN because it has advantages of high performance,reconfigurability,high energy efficiency,and fast development round,etc.Xilinx Zynq-7000 All Programmable System-on-Chip(SoC)consists of a dual-core ARM Cortex-A9 MPCore based Processing System(PS)and an Artix-7 FPGA as programmable logic(PL).The system offers the flexibility and scalability of an FPGA,while providing performance,power,and ease of use.In this paper,we go deeper with the Zynq platform on accelerating CNNs and propose a CNN accelerator design for embedded systems.The paper studies the following:(1)Structural features,the design flow of Zynq-7000 All Programmable System on Chip;(2)the intra-layer parallelism and the implementation method of the parallelism;(3)How to design the accelerator architecture according to those types of parallelism,and explore the design space in order to find out the optimal solution;(4)the RTL design of the CNN accelerator;(5)Implementing CNN through hardware/software co-design on Zynq platform.At the end of this paper,the experimental comparison is carried out.Firstly,we compare our design with an equivalent implementation on dual-core ARM Cortex-A9 and general-purpose computers in terms of performance and energy efficiency.Results show that the system with hardware acceleration achieves 48 images/Joule,which is 8×,16 × and 9.6 × higher than dual-core ARM Cortex-A9,the desktop and the laptop.Then we compared with other high-performance CNN accelerator,our design didn't meet higher requirements,but it has a great deal of price advantage.According to the comparison results,this design can implement the CNN on hardware platform at low cost,and can achieve high performance per watt,so we can meet the needs of embedded system,low energy consumption and low cost.
Keywords/Search Tags:Convolutional Neural Network, All Programmable SOC, Hardware Acceleration, Hardware/Software Co-design, Parallelism
PDF Full Text Request
Related items