Compressing And Accelerating Deep Neural Networks

Posted on:2019-04-25

Degree:Master

Type:Thesis

Country:China

Candidate:X F Xiao

Full Text:PDF

GTID:2428330566486037

Subject:Physical Electronics

Abstract/Summary:

PDF Full Text Request

Smart portable devices such as mobile phones and tablet PCs have become increasingly popular in recent years,because of the continuous development of mobile communication networks and Internet.People have a higher demand for related intelligent applications,especially related applications in the field of computer vision and text recognition.At present,in the related fields of pattern recognition such as computer vision and text recognition,the accuracy of related algorithms is greatly improved which are based on deep neural networks(DNN)include deep convolutional neural networks(CNN)and deep recursive neural networks(RNN),compared to traditional algorithms.However,the algorithm based on DNN has a larger amount of computation and parameter storage than traditional algorithms in various fields,which requires a longer calculation delay and a larger memory storage.It is the reason why it is difficult in applying large-scale mobile devices.Based on the problem of large amount of parameters and computational complexity of the DNN,this paper focuses on network acceleration algorithms,model compression algorithms,and the innovative ways to combine them organically.It mainly includes the following aspects:(1)For accelerating and compressing the CNN,Global Supervised Low-rank Expansion(GSLRE)is proposed,which solves effectively the problem of large loss of precision when low-rank decomposition of convolutional layers is performed.It still uses the label information of the input data as its convergence target.An Adaptive Drop-weight(ADW)is raised,which effectively mitigates the problems of traditional pruning algorithm.On one hand,the threshold of the traditional one is set too large for each layer of the network whose performance is drastically reduced,on the other hand,it is too small which leads to the problem of insufficient compression factor.It can dynamically increase the threshold and gradually prunes out the parameters of each layer.The concept of Connection Redundancy Analysis(CRA)is put forward,which can analyze the degree of connectivity at each layer under a given accuracy of loss threshold,so that it may guide us in setting a proper pruning ratio for each layer.Subsequently,a combination of GSLRE,ADW and CRA is applied into accelerating and compressing the CNN,which is used for offline handwritten Chinese character recognition(HCCR).This can reduce the network's computational cost by nine times and compress the network to 1/18 of the original size of the baseline model,with only a 0.21% drop in accuracy.So far,this result can exceed the single model with the highest accuracy on the data set,and the network's parameter amount and calculation amount are also smaller than the latter.(2)For accelerating and compressing the RNN,the Singular Value Decomposition(SVD)is adopted to decompose the matrix in LSTM and fully connected layers so as to reduce the computational cost.Then,ADW is used to gradually remove redundant connections in each layer to reduce the network parameter storage.Finally,this paper integrates the SVD and ADW for accelerating and compressing LSTM-based network,which is designed for online HCCR.This can reduce the network's computational cost by 31 times and compress the network about 13.6 times of the baseline model,with only a 0.5% drop in accuracy.(3)For the DNN forward computation,this study introduces the relevant graph optimization of DNN.It mainly includes the fusion of activation function layer,and integration of batch normalization layer and scale layer into former layer by precomputing its parameter,which is equal to ignore its computation.Then,this paper analyzes the forward calculation of CNN and RNN,and uses the loop unrolling,BLAS library and sparse matrix multiplication to optimize the forward time.Ultimately,this article uses a single-threaded CPU to identify an offline HCCR that requires only 9.7ms and 2.3MB.Recognizing an online HCCR requires only 2.7ms and 0.45 MB.

Keywords/Search Tags:

Convolutional neural network, Recurrent neural network, Model compression, Forward acceleration

PDF Full Text Request

Related items

1	Research On Compression And Acceleration Of Deep Convolutional Neural Networks
2	Convolutional Neural Network Compression And Accelerate Forward Inference Technology Research
3	Research On Model Compression And Acceleration Method Based On Convolutional Neural Network
4	The Acceleration And Compression Of Convolutional Neural Networks
5	Research On Convolutional Neural Network Acceleration And Compression Methods
6	Research On Compression And Acceleration For Deep Convolutional Network Model
7	Compression And Acceleration Of Recurrent Neural Network Model
8	Research Of The Model Compression Algorithm For Deep Neural Network
9	Research On The Acceleration And Optimization Method Of Convolutional Neural Network
10	Research On Application Of Neural Network Compression And Acceleration Based On Quantization