Font Size: a A A

Research On Compression Method Of Deep Neural Networks Model Based On Parameter Pruning And Sharing

Posted on:2020-10-24Degree:MasterType:Thesis
Country:ChinaCandidate:G X XuFull Text:PDF
GTID:2428330620456159Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Deep neural networks have high computational complexity and high parameter storage costs,which makes it difficult to deploy on embedded devices with limited hardware resources and a tight power budget.To solve this problem.The thesis is based on the parameter pruning and sharing method,studying the problem of deep neural network compression,and focusing on improving the compression ratio of deep neural network and reducing the computational cost in convolutional layer.The main research work is as follows.Firstly,the development status of deep neural network compression technology is introduced.It summarizes the typical methods of compression network,namely parameter pruning and sparseness,weight quantization and parameter sharing,and matrix decomposition.It introduces the basic principles and structure of deep neural networks,including feedforward neural networks,feedback neural networks and BP neural networks.It studies convolutional layer,the pooling layer and the nonlinear layer of the convolutional neural network used in the experiment,and introduces typical convolutional neural network model.Secondly,the paper compares and analizes two main methods based on parameter pruning,namely unstructured and structured pruning.It stuies two kinds of deep neural network compression methods based on unstructured pruning and iterative pruning based on Taylor expansion criterion.Four basic methods of neural network parameter sharing are introduced,randomized parameter sharing,parameter sharing based on hash function,parameter sharing based on structured matrix and parameter sharing based on vector quantization.It compares and analizes three network retraining strategies of single pruning retraining,iterative pruning retraining and dense-sparse-dense retraining.Then,the paper combines dynamic pruning network weight method and the parameter sharing method.On the one hand,on the basis of parameter pruning,it emphasizes the dynamic pruning of the network.Specifically,by introducing a splicing operation in the pruning process,the important weight connections in the network are retained to the greatest extent,and the erroneous operations existing in the pruning process are avoided,thereby ensuring the accuracy of the model and reducing the learning time of the model.To further improve accuracy,regularization is added during the pruning process.The AlexNet network based on the CIFAR-10 data set has improved the accuracy of L1 regularization by about 0.4%.On the other hand,after the parameter pruning is completed,the weight is quantized by the K-Means method,so that the parameters are shared,and the compression of the deep neural network is increased.This approach achieves52 ? lossless compression on the AlexNet network based on the ImageNet datasets.Finally,we propose a hybrid pruning compression method combined with deep neural network filter pruning.On the one hand,the filter is modified in a dynamic manner,so that the pruned filter in the network continues to participate in the update during the model training process,which ensures the capacity of the model during the pruning process,thereby stabilizing the accuracy of the model.Since the model pruning and training steps are performed simultaneously,the fine-tuning process in the general filter pruning process is omitted,thereby saving the learning time of the model.The dynamic filter pruning method achieves a 40.8% convolution operation acceleration for the ResNet-110 network based on the CIFAR-10 data set with a loss of accuracy of only 0.3%.On the other hand,the dynamic filter pruning and parameter sharing of the dynamic filter pruned network further remove redundant parameters and reduce parameter storage costs.The hybrid compression method redues the number of connections by 105? and 70% convolution operation acceleration for LeNet-5 based on the MNIST datassets,and 38? compression and 73% for the AlexNet network based on the CIFAR-10 datasets.The convolution operation is accelerated and there is no loss of precision.
Keywords/Search Tags:deep neural network, model compression, parameter pruning and sharing, mixed pruning compression
PDF Full Text Request
Related items