Implementation And Rapid Deployment Of Embedded Convolutional Neural Networks Based On APSoC Architecture

Posted on:2020-11-29

Degree:Master

Type:Thesis

Country:China

Candidate:J Wang

Full Text:PDF

GTID:2428330572480078

Subject:Control theory and control engineering

Abstract/Summary:

PDF Full Text Request

In recent years,with the continuous development of deep learning in embedded systems,it has made important contributions to the popularization of artificial intelligence.Among them,convolutional neural network as a deep learning method has realized huge in the fields of image recognition,target detection and image segmentation.Breakthrough."Server training + embedded deployment"has become a common research and development model.Therefore,this paper aims to implement CNN framework construction and rapid deployment based on FPGA high performance embedded platform.It can provide FPGAs for high-performance computing(HPC)tasks as well as ARM Linux systems for high-level language design interfaces.In order to achieve the objectives of this paper,we need to consider the following two factors:high-throughput hardware structure on the FPGA side,data stream interface,memory architecture,design parameterization,performance optimization,scalability;ARM-side advanced interface framework and CPU-FPGA data transfer API.In view of the above considerations,the main work of this paper is:1)the overall scheme design of embedded system software and hardware;2)introducing synchronous data stream(SDF)model for hardware structure design and reloading off-chip storage weight;3)through high-level synthesis(HLS)parameterizes the FPGA accelerator and optimizes the performance of CNN bv using folding factor,"interlace"and cyclic pipeline technology;4)proposes the scalability of hardware framework using SDF subgraph to segment complete SDFG;5)implement CNN in embedded Built on the system and quickly deployed.Finally,for the LeNet-5 mainstream model.this paper compares the computing power of three different hardware frameworks on the same capacity FPGA platform,They are:0.216GOPs[46],0.48GOPs[43].0.988GOPs[56].The designed hardware framework achieves throughput of up to 1.863 GOPs and performance far exceeds the hardware architecture of the same FPGA capacity.A similar hardware framework for this paper achieves a throughput of 3.25 GOPs on the CIFAR-10 model,which is 42 times the performance of the ARM Cortex A9 processor.In addition,this paper uses the 32-bit ARM processor to achieve 98.75%and 85.71%accuracy on the Mnist handwritten dataset and CIFAR-10 image classification dataset.The accuracy of the FPGA is 98.4%and 84.42%.The accuracy loss is 0.35%and 1.29%The innovations of this paper are as follows:1)In the APSoC architecture,the FPGA hardware design point is represented as SDFG,and the complex CNN model is realized by segmenting the SDF subgraph;2)"interlacing" and folding are proposed to optimize the accelerator to improve the throughput.

Keywords/Search Tags:

Convolutional Neural Network, APSoC, Synchronous dataflow, High-level Synthesis, Rapid deployment

PDF Full Text Request

Related items

1	Research On Dataflow Architecture-based High Level Synthesis For Graph Processing
2	Research On Key Techniques Of Deep Convolutional Neural Network Accelerators Based On FPGA Bus Framework
3	Research On Convolutional Neural Network Acceleration Framework For Cloud-based FPGAs
4	The Convolutional Neural Network Accelerator Research Based On The Tiling Dataflow
5	Application Research And Design Of Deep Neural Network Based On FPGA
6	Research On Neural Network Accelerator Based On PYNQ
7	High level synthesis of neural network chips
8	Embedded Implementation And Algorithm Optimization Of Gesture Recognition Based On Convolutional Neural Network
9	Research Of Scalability On FPGA-based Neural Network Accelerator
10	The Research Of Allocation Method Based On Low Power In High-level Synthesis