Study On Convolutional Neural Network Compression Methods Based On Pruning And Quantization

Posted on:2020-11-02

Degree:Master

Type:Thesis

Country:China

Candidate:H W Li

Full Text:PDF

GTID:2428330623451386

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

After years of development,artificial neural networks have evolved into a variet y of unique structures.Among them,Convolutional Neural Network(CNN)has attracted wide attention from researchers due to its outstanding performance in the fields of computer vision,speech recognition and natural language processing.However,as CNN's functions become more and more powerful,its network model is also becoming larger and larger,training is time-consuming,and hardware requirements are demanding,which restricts the development of CNN.Therefore,the demand for CNN model compression should come out.In view of the above problems,this thesis proposes a CNN model compression method based on the step-b y-step pruning strategy.The difference between the compression method and the predecessor b y reducing the number of the CNN weights is that when the weights of each layer is pruned,the method first selects a part of the weights retained b y the current layer,and sets the threshold according to the selected weight subset.Removing the weights whose absolute value is less than the threshold.The weights retained after pruning are retrained to compensate for the loss of precision caused b y pruning.Continue to select a portion of the current reserved weights to perform the pruning operation,and then retrain the remaining weights until the final compression ratio is reached.Compared with the general pruning strategy,the step-b y-step pruning strategy considers the impact of the current layer partial weight pruning on the importance of residual weights,making the pruning granularit y smaller and reducing the precision loss caused by mispricing.This thesis proposes a weight quantization method that does not require retraining after quantization: interval quantization.When using the interval quantization method to quantize the weights,the weights in a layer is divided into corresponding intervals according to the number of bits used to represent the weights after quantization,and the weights in each interval is represented by the intermediate value of the interval in which it is located.By quantifying the weights of CNN models,the weights of CNN models are represented b y fewer bits,which reduce the model storage requirements and provide conditions for CNN applications to devices with limited storage space such as embedded devices.Finall y,the proposed method is validated using the classical CNN model.The experimental results show that the 98.2% and 99.03% weights can be pruned for the LeNet-300-100 and LeNet-5 b y using step by step pruning strategy.The step by step pruning strategy is superior to most weight-based pruning compression methods under the same pruning rate.The interval quantization method is combined with the step b y step pruning strategy to further compress the CNN model.Experiments have shown that the compression ratio of LeNet-5 after using the two methods is 0.60%,which is 53.49% higher than using the step b y step pruning strategy alone.In addition,this quantification method does not require retraining and is easier to use than other quantification methods.

Keywords/Search Tags:

Convolutional Neural Network, Model Compression, Weights Pruning, Weights Quantization

PDF Full Text Request

Related items

1	Study Of Low Bit-width Quantization Of Deep Convolutional Neural Network
2	Pruning-Based Compression Method For Convolutional Neural Network
3	Design And Verification Of DNN Compression Algorithm Based On Structure Pruning
4	Research On Compression And Acceleration For Deep Convolutional Network Model
5	Research On Convolutional Neural Network Compression Method Based On Dynamic Pruning And Weight Resetting
6	Compression Algorithm And Circuit Design Of Convolutional Neural Networks
7	Research On Deep Neural Network Model Compression Method Based On Parameter Pruning
8	Research On Model Compression Method Of Deep Convolution Neural Network
9	The Acceleration And Compression Of Convolutional Neural Networks
10	Research And Improvement Of Model Compression Method Based On Deep Convolutional Neural Network