Font Size: a A A

Study On Convolutional Neural Network Compression Methods Based On Pruning And Quantization

Posted on:2020-11-02Degree:MasterType:Thesis
Country:ChinaCandidate:H W LiFull Text:PDF
GTID:2428330623451386Subject:Computer technology
Abstract/Summary:PDF Full Text Request
After years of development,artificial neural networks have evolved into a variet y of unique structures.Among them,Convolutional Neural Network(CNN)has attracted wide attention from researchers due to its outstanding performance in the fields of computer vision,speech recognition and natural language processing.However,as CNN's functions become more and more powerful,its network model is also becoming larger and larger,training is time-consuming,and hardware requirements are demanding,which restricts the development of CNN.Therefore,the demand for CNN model compression should come out.In view of the above problems,this thesis proposes a CNN model compression method based on the step-b y-step pruning strategy.The difference between the compression method and the predecessor b y reducing the number of the CNN weights is that when the weights of each layer is pruned,the method first selects a part of the weights retained b y the current layer,and sets the threshold according to the selected weight subset.Removing the weights whose absolute value is less than the threshold.The weights retained after pruning are retrained to compensate for the loss of precision caused b y pruning.Continue to select a portion of the current reserved weights to perform the pruning operation,and then retrain the remaining weights until the final compression ratio is reached.Compared with the general pruning strategy,the step-b y-step pruning strategy considers the impact of the current layer partial weight pruning on the importance of residual weights,making the pruning granularit y smaller and reducing the precision loss caused by mispricing.This thesis proposes a weight quantization method that does not require retraining after quantization: interval quantization.When using the interval quantization method to quantize the weights,the weights in a layer is divided into corresponding intervals according to the number of bits used to represent the weights after quantization,and the weights in each interval is represented by the intermediate value of the interval in which it is located.By quantifying the weights of CNN models,the weights of CNN models are represented b y fewer bits,which reduce the model storage requirements and provide conditions for CNN applications to devices with limited storage space such as embedded devices.Finall y,the proposed method is validated using the classical CNN model.The experimental results show that the 98.2% and 99.03% weights can be pruned for the LeNet-300-100 and LeNet-5 b y using step by step pruning strategy.The step by step pruning strategy is superior to most weight-based pruning compression methods under the same pruning rate.The interval quantization method is combined with the step b y step pruning strategy to further compress the CNN model.Experiments have shown that the compression ratio of LeNet-5 after using the two methods is 0.60%,which is 53.49% higher than using the step b y step pruning strategy alone.In addition,this quantification method does not require retraining and is easier to use than other quantification methods.
Keywords/Search Tags:Convolutional Neural Network, Model Compression, Weights Pruning, Weights Quantization
PDF Full Text Request
Related items