Research On Deep Neural Network Model Compression Method Based On Parameter Pruning

Posted on:2021-10-31

Degree:Master

Type:Thesis

Country:China

Candidate:M Zhou

Full Text:PDF

GTID:2518306476450284

Subject:Signal and Information Processing

Abstract/Summary:

PDF Full Text Request

Deep convolutional neural networks have verified its effectiveness in contemporary visual tasks.However,this effectiveness depends on the huge amount of parameters and expensive computing resources,making it difficult for CNN to be deployed on many resource-constrained devices.Therefore,it is necessary to compress and accelerate the deep neural network.Starting from this demand,this article has carried out the following work from the perspective of neural network filter pruning and network quantization:First,this article introduce convolutional neural networks.The basic components of the CNN are introduced from the convolution layer,activation function,BN layer and fully connected layer.At the same time,the compression methods of deep neural network are analyzed.The network pruning,network quantization,small model design and other network compression methods in CNN are introduced.Secondly,from the perspective of the absolute value of the parameter,the filter pruning methods KN and KG based on the filter norm and the BN layer scaling factor are analyzed.Based on this,a pruning method KNG that comprehensively considers the filter norm and its corresponding BN layer scaling factor is proposed.In addition,two different filter pruning strategies have been proposed,one is a global adaptive pruning strategy,and the other is an equal probability pruning strategy for each layer.The algorithm was verified on the VGG17 network using cifar10 dataset.The experimental results show that the KNG algorithm compresses 88.54% of the parameter and 50.89% of the calculation,and achieve 0.3% accuracy improvement.Thirdly,starting from the similarity of each layer of filters,two pruning methods KAS and KMMD based on filter similarity are proposed.Both algorithms use the distance between filters as the criterion for evaluating the similarity of filters.The first method used the average similarity of the filter as the evaluation criterion of its importance,so as to ensure the diversity and Effectiveness of the filter.Another method is the maximum minimum distance algorithm.The main purpose of this algorithm is to ensure that the minimum distance between the two filters is maximized.The algorithm was verified on the VGG17 network using cifar10 dataset.These two algorithms could maintain the accuracy improvement of 0.1% ? 0.2% when the filter pruning ratio was 0.7.At the same time,the article also introduces soft filter pruning.The performance,advantages and disadvantages of these two different strategies are compared.Finally,after implementing the neural network filter pruning,this paper also uses parameter quantization to further compress the neural network.The article first introduces the incremental network quantization algorithm INQ and proposes an improved INQ algorithm EINQ.The algorithm was verified on the VGG17 network using cifar10 dataset.Compared with the INQ algorithm,the accuracy of the EINQ algorithm is improved by about 0.3%.Combining KNG pruning algorithm and EINQ quantization,the weight storage of convolution layer is compressed by 35 times,and the accuracy of the model is reduced within 0.2%.

Keywords/Search Tags:

DNN network compression, filter pruning, network quantization, soft filter pruning, network acceleration

PDF Full Text Request

Related items

1	Research On Compression Method For Convolutional Neural Network Based On Pruning
2	Research On Adaptive Soft Pruning Algorithm Based On Sensitivity Feedback
3	Pruning-Based Compression Method For Convolutional Neural Network
4	Research And Application Of Structured Model Compression Algorithm In Deep Neural Network
5	Research On Filter Prunning Method Of Deep Convolution Neural Network
6	Research On Compression And Acceleration For Deep Convolutional Network Model
7	Convolutional Neural Network Compression By Fusing Weight And Filter Pruning
8	Research On Compression And Acceleration Of Deep Neural Network Based On Model Pruning
9	The Acceleration And Compression Of Convolutional Neural Networks
10	Research And Implementation Of Deeplearning Model Optimization Based On Network Compression