Research On Compression And Acceleration Of Deep Neural Network Based On Model Pruning

Posted on:2021-02-27

Degree:Master

Type:Thesis

Country:China

Candidate:J H Xu

Full Text:PDF

GTID:2518306476450274

Subject:Signal and Information Processing

Abstract/Summary:

PDF Full Text Request

Although the deep neural network has achieved great success recently,but with its high performance comes the high computation and storage costs.In order to apply deep neural network to mobile devices,which motivates the researches on acceleration and compression of neural networks.We focuses on the compression and acceleration of neural network based on model pruning,in order to reduce the parameters of network effectively while maintain the performance of network.In this thesis,on the one hand,the weight absolute value and the corresponding change in the iterative pruning are used as the basis of the importance judgment,on the other hand,the pruning framework and knowledge distillation are combined to achieve better compression effect.This thesis' s contributions are as follows:Firstly,we introduce the basic components of convolution neural network,including convolution operation,pooling operation and activation function,and summarize the regularization methods.This paper introduces motivation and basic implementation methods of various model compression and acceleration methods,including weight quantization,low-rank decomposition,model pruning and knowledge distillation.Pruning algorithm based on the absolute value of weights was simulated from the perspectives of structured pruning and nonstructured pruning,which demonstrate the performance of model pruning algorithm.Secondly,based on the importance criterion of weight absolute value,we propose an iterative pruning method combined with weight variations.The addition of weight variation can alleviate the problem of relying on the absolute value of weight too much,and can well protect those weights with small weight currently but rapid change.Compared with the pruning method based on weight absolute value,VGG-16 and Res Net-56 can improve the test accuracy by 0.18% and0.37% on CIFAR-10 when the pruning ratio is 95% and 90% respectively,while VGG-16 and Res Net-56 can improve the test accuracy by 0.69% and 0.51% under 80% and 60% of the pruning ratio in CIFAR-10 experiment of structured pruning.The fixed proportion pruning is changed into the progressive iterative pruning,more parameters are cut at the initial stage of pruning,but the further pruning reduces the pruning strength so as not to damage the structure of the network and effectively improve the precision of the sparse model.Thirdly,according to the essence of pruning is to obtain efficient network structure instead of inheriting the corresponding weight,the fine-tuning steps in the pruning framework are replaced by reinitializing training,and the pruning method combined with weight variation is verified on CIFAR-10 dataset.The experimental results show that compared with the finetuning scheme,the accuracy of VGG-16 with the reinitialized training can be improved by 0.02%when the pruning ratio is 95%,while Res Net-56 loses 0.13%.In structured pruning experiment,VGG-16 and Res Net-56 improved 0.46% and 0.13% testing accuracy when pruning ratio was90% and 80%,respectively.Finally,under the pruning framework of reinitialized training,the training is combined with knowledge distillation.In the pruning process,the original network is regarded as the teacher network,and the sparse network is regarded as the student network.By using the soft target from the teacher network to train the sparse network,the accuracy loss caused by pruning can be compensated.The validation results on CIFAR-10 show that the accuracy of VGG-16 and Res Net-56 can be improved by 0.27% and 0.83% respectively when the pruning ratio is 95%,while that of structured pruning is more obvious,the accuracy gains of VGG-16 and Res Net-56 are 1.4% and 1.31% respectively at 90% and 80% pruning ratio.

Keywords/Search Tags:

Deep neural network, Model compression and acceleration, Model pruning, Iterative pruning, Knowledge distillation

PDF Full Text Request

Related items

1	Research And Application Of Model Compression Algorithm Based On Pruning-quantization-knowledge Distillation
2	Development Of Model Compression And Inference Acceleration Algorithms Of Image Super Resolution Deep Neural Networks
3	Research On Deep Neural Network Model Compression Method Based On Parameter Pruning
4	Research On Deep Neural Networks Compression And Acceleration
5	Research On Model Compression And Acceleration For Deep Neural Network
6	Sparsity Constraint And Pruning Based On Convolutional Neural Network
7	Similarity-Based Approach To Neural Network Pruning
8	Research On Model Compression Method Of Deep Convolution Neural Network
9	Research On Compression Method Of Deep Neural Networks Model Based On Parameter Pruning And Sharing
10	Pruning Neural Networks Based On Stochastic Gradient Sparse Optimization