Optimization Of Neural Networks Based On Partial Binarized Convolution For Embedded Devices

Posted on:2020-08-25

Degree:Master

Type:Thesis

Country:China

Candidate:Y J Ling

Full Text:PDF

GTID:2428330596993890

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With their excellent accuracy,convolutional neural networks(CNNs)have obtained tremendous success in a wide range of tasks such as image classification,object detection and so on.Traditionally,these models are usually deployed on the cloud data centers equipped with high-end GPUs.However,this cloud-centric application framework may arises some issues such as user privacy,long response time,and in the case without internet the framework cannot even work.Therefore,there is an increasing number of researchers who focus on deploying networks directly on the resource-constrained embedded systems.However,resource-constrained embedded systems often cannot afford the overhead brought by CNNs.In order to solve these challenges,several simple and effective approaches have been proposed.However,how to effectively reduce the computational and storage consumption of CNNs is still a challenging problem.In order to reduce the computational and storage requirements while achieving almost lossless accuracy on large-scale datasets,this paper proposes a framework,called TaijiNet,based on partial binarized convolution.TaijiNet is able to measure the sensitivity of each layer of the network and the importance of each convolutional kernel.Then,TaijiNet will binarizes kernels according to their importance.TaijiNet mainly consists of four parts: PCA cumulative energy analysis,partial binarized convolution,pointwise convolution equipping and network retraining.First,the framework takes a given pre-training network as input,and obtains the binarization ratio according to PCA cumulative energy curves and a preset threshold.Then,the framework uses partial binarized convolution strategy to measure the importance of each kernel and decomposes the original convolutional layer into important full-precision layer and unimportant binary layer.Next,TaijiNet equips each binary layer with a pointwise convolutional layer to help it approximate its full-precision version.Finally,TaijiNet retrains network to recover the reduced accuracy.In addition,our framework also provides the option of input binarization to further accelerate the inference at the expense of accuracy.The contributions of this paper are as follows:This paper proposes a strategy,called partial binarized convolution,which consists of kernel grouping,layer reconstruction and channel reordering.Partial binarized convolution reconstructs original layers according to the importance of convolutional kernels without affecting the whole model,so as to improve the variability of the model and reduce the loss of accuracy caused by binarization.This paper explores the redundancy of convolutional layers and the difference of convolutional kernels.A PCA cumulative energy curve based method and a L1-mean based method are proposed to measure the redundancy of each layer of a given network and the importance of each convolutional kernel,so that TaijiNet is able to deploy different quantization strategy for different redundancies and importance.This paper proposes an accuracy improvement method for binary networks.This method uses pointwise convolutional layers and scaling factor to optimize the approximation of binary weights to original weights,and prevent the influence of pointwise layers on the network by a one-hot-like initialization method.Experimental results show that in the case where only weights are binarized,the proposed framework obtains 26 times compression rate and 57.9% Top-1 accuracy on all layers of AlexNet.Meanwhile,almost 85.7% floating point multiply operations are converted to bit operations which are hardware-friendly.In the case where inputs and weights are both binarized,10 times to 22 times speedup can be obtained on the TX1 platform.By using different PCA cumulative energy thresholds,the framework can sacrifice performance for the improvement of accuracy so that the network can adapt to more environments.

Keywords/Search Tags:

Model compression, Convolutional neural network, Binarization, PCA, Importance

PDF Full Text Request

Related items

1	Compression Method Research Of Deep Convolution Neural Network Model By Low-rank And Sparse Decomposition
2	Research Of Model Compression Method Based On Quantized Convolutional Neural Network
3	The Research On Algorithm Optimization Of Convolutional Neural Network Model Compression
4	Research On Technology Of Model Compression For Convolutional Neural Networks
5	Research On Model Compression Method For Convolutional Neural Network
6	Research On Convolutional Neural Network Acceleration And Compression Methods
7	Compression Of Convolutional Neural Networks Based On Groups Of Feature Templates
8	Research And Implementation Of Deep Convolutional Neural Network Model Compression Algorithm
9	Research On High-efficiency Compression Technology For Convolutional Neural Networks
10	Research On Compression And Acceleration Of Deep Convolutional Neural Networks