Font Size: a A A

Implementation And Rapid Deployment Of Embedded Convolutional Neural Networks Based On APSoC Architecture

Posted on:2020-11-29Degree:MasterType:Thesis
Country:ChinaCandidate:J WangFull Text:PDF
GTID:2428330572480078Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
In recent years,with the continuous development of deep learning in embedded systems,it has made important contributions to the popularization of artificial intelligence.Among them,convolutional neural network as a deep learning method has realized huge in the fields of image recognition,target detection and image segmentation.Breakthrough."Server training + embedded deployment"has become a common research and development model.Therefore,this paper aims to implement CNN framework construction and rapid deployment based on FPGA high performance embedded platform.It can provide FPGAs for high-performance computing(HPC)tasks as well as ARM Linux systems for high-level language design interfaces.In order to achieve the objectives of this paper,we need to consider the following two factors:high-throughput hardware structure on the FPGA side,data stream interface,memory architecture,design parameterization,performance optimization,scalability;ARM-side advanced interface framework and CPU-FPGA data transfer API.In view of the above considerations,the main work of this paper is:1)the overall scheme design of embedded system software and hardware;2)introducing synchronous data stream(SDF)model for hardware structure design and reloading off-chip storage weight;3)through high-level synthesis(HLS)parameterizes the FPGA accelerator and optimizes the performance of CNN bv using folding factor,"interlace"and cyclic pipeline technology;4)proposes the scalability of hardware framework using SDF subgraph to segment complete SDFG;5)implement CNN in embedded Built on the system and quickly deployed.Finally,for the LeNet-5 mainstream model.this paper compares the computing power of three different hardware frameworks on the same capacity FPGA platform,They are:0.216GOPs[46],0.48GOPs[43].0.988GOPs[56].The designed hardware framework achieves throughput of up to 1.863 GOPs and performance far exceeds the hardware architecture of the same FPGA capacity.A similar hardware framework for this paper achieves a throughput of 3.25 GOPs on the CIFAR-10 model,which is 42 times the performance of the ARM Cortex A9 processor.In addition,this paper uses the 32-bit ARM processor to achieve 98.75%and 85.71%accuracy on the Mnist handwritten dataset and CIFAR-10 image classification dataset.The accuracy of the FPGA is 98.4%and 84.42%.The accuracy loss is 0.35%and 1.29%The innovations of this paper are as follows:1)In the APSoC architecture,the FPGA hardware design point is represented as SDFG,and the complex CNN model is realized by segmenting the SDF subgraph;2)"interlacing" and folding are proposed to optimize the accelerator to improve the throughput.
Keywords/Search Tags:Convolutional Neural Network, APSoC, Synchronous dataflow, High-level Synthesis, Rapid deployment
PDF Full Text Request
Related items