An Embedded Inference Framework For Deep Convolution Neural Network:Design And Implementation

Posted on:2021-03-18

Degree:Master

Type:Thesis

Country:China

Candidate:Y J Zhang

Full Text:PDF

GTID:2428330611966931

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Since 2012,deep convolutional neural networks have greatly promoted the development of computer vision algorithms.In order to achieve better performance,researchers have designed larger and larger convolutional neural networks,from Alex Net to VGG,Google Net,and Res Net.Due to privacy or network reasons,the data on the embedded or mobile platform cannot be sent to the server side.So,the deployment of convolutional neural networks on embedded platforms has become a new research trend.Convolutional neural networks with a large number of parameters are not suitable for embedded platforms with limited computing resources.So,there are some lightweight convolutional neural networks designed specifically for embedded mobile platforms such as Mobile Net,Shuffle Net and Squeeze Net.However,these lightweight convolutional neural networks are trained on a desktop platform.If these lightweight convolutional neural networks are run on an embedded platform directly,their performance is not high.Therefore,this paper designs and implements an inference framework on embedded platforms for deep convolutional neural networks to solve this problem.This paper designs and implements an embedded inference framework of deep convolutional neural networks suitable for the embedded platforms which have ARM CPUs.The whole framework mainly includes four components: model conversion component,runtime,basic network components(Net,Layer,Blob),and acceleration components.We analyze the performance bottlenecks of lightweight convolutional neural networks in the inference process on embedded platforms through theory and experiments,and design corresponding optimization algorithms for 1 � 1 standard convolution and 3 � 3 depthwise-separable convolution to improve the performance of these network layers.A memory pool algorithm is also designed to solve the frequent memory allocation and deallocation problems introduced by the lightweight inference mode of the convolutional neural networks,improving the performance of the entire framework.Finally,the convolutional neural network inference framework designed and implemented in this paper is tested experimentally on the Firefly-RK3399 development board.Experimental results show that our 1�1 convolution optimization algorithm has a performance improvement of about 70%-90% compared to the performance before optimization.In the case of a large amount of calculation and a weak CPU,the 3 � 3 depthwise-separable convolution optimization algorithm has a performance improvement of about 50%.And our memory pool algorithm,with the cooperation of lightweight reasoning mode,can reduce the memory footprint in the inference process without losing the inference performance.

Keywords/Search Tags:

Embedded deployment of convolutional neural network, Forward computing framework of convolutional neural network, Optimization on embedded platforms, Optimization of convolution

PDF Full Text Request

Related items

1	Design And Implementation Of Neural Network Computing Framework For ZYNQ SoC Embedded Platform
2	Optimization Extension And Evaluation Of Deep Learning Framework Based On Embedded Platform
3	Design And Application Research Of Convolutional Neural Network Accelerator Based On ZYNQ Platform
4	Depthwise Separable Convolutional Neural Network Structure Optimization For Embedded Systems
5	Optimization Of Neural Networks Based On Partial Binarized Convolution For Embedded Devices
6	Convolutional Neural Network Structure Design And Optimization Based On Embedded Platform
7	Algorithmic Optimization And Efficient Deployment For Convolutional Neural Networks
8	Design And Implementation Of Embedded Convolutional Neural Network Bread Intelligent Retail System
9	Implementation And Rapid Deployment Of Embedded Convolutional Neural Networks Based On APSoC Architecture
10	Embedded Implementation And Algorithm Optimization Of Gesture Recognition Based On Convolutional Neural Network