Research On Weight Compression Method Of CNN Based On Integer Coefficient Representation

Posted on:2020-02-21

Degree:Master

Type:Thesis

Country:China

Candidate:K Meng

Full Text:PDF

GTID:2428330620956210

Subject:Electronic and communication engineering

Abstract/Summary:

PDF Full Text Request

Convolutional neural network(CNN)model compression is an effective method to reduce the parameter redundancy and storage space of CNN.Based on the CNN,this paper focuses on the compression methods based on weight pruning and weight quantization.In Chapter 1,it introduces the research background and status quo of CNN compression methods.Secondly,the content and the structure of this paper are introduced.In Chapter 2,the basic theory of CNN is introduced.Firstly,the composition of CNN is introduced,including convolution layer,full connection layer,activation function,pooling layer and SoftMax.Secondly,some optimization methods of CNN are introduced,including gradient descent,error back propagation,parameter initialization,batch normalization,and avoiding under-fitting and over-fitting.Then this paper introduces the common models of CNN: LeNet-5,AlexNet,VGNet and ResNet.Finally,this paper introduces the training framework of neural network: TensorFlow,Caffe and Keras.In Chapter 3,several classical CNN compression methods are analyzed,including weight quantization,model pruning and model design.Firstly,BinaryConnect,a CNN compression method based on binary quantization,is analyzed.Secondly,two compression methods based on ternary quantization,TWN and TTQ,are introduced.This paper also analyses a compression method based on weight 8-bit quantization,and quantifies ResNet and LeNet-5 with this method.This method compresses the model four times,while the decrease of model accuracy is less than 1%.The above several CNN compression methods are compared in this paper.Then this paper introduces channel pruning based on Taylor formula and gamma coefficient.At last,this paper introduces a compression method based on model design: MobileNet.In Chapter 4,firstly,the distribution of the weight of CNN and the format of computer storage numerial number are studied.According to the distribution of the weight and the characteristics of floating and fixed points,a compression method of CNN based on Integer Coefficient Representation(ICR)is proposed.Eight-bit integer coefficient fixed points are used instead of 32-bit floating points to save weights.In this paper,the updating strategy of weights is modified.Then,this paper compares and analyses the affect of step-by-step iteration strategy for integer coefficients representation on the results.This paper also compares the effects of different weight selection strategies and regularization strategies on the results.Then we use step-by-step iteration,the maximum absolute value first and L2 regularization term to quantify the CNN.The ICR algorithm proposed in this paper compresses the storage space of CNN model by four times.At the same time,the accuracy on ResNet is 0.27% higher than that on the original network,and LeNet-5 is 0.14%.Finally,ICR algorithm is compared with other quantization methods.In Chapter 5,several classical pruning methods are introduced firstly,and then the weight distribution characteristics of sparse networks obtained after pruning are analyzed.Then,a CNN compression method is proposed,which combines pruning algorithm with integral coefficient epresentation.When quantifying sparse CNNs,the neural networks can dynamically restore some important pruned connections,which avoiding possible performance degradation.Without considering the location information of sparse network weight,ICR compresses ResNet about 12 times,LeNet-5 about 78 times,and the accuracy of the model does not decrease.

Keywords/Search Tags:

CNN, Deep learning, Model compression, Integral coefficient representation, Parameter pruning

PDF Full Text Request

Related items

1	Research On Compression Method Of Deep Neural Networks Model Based On Parameter Pruning And Sharing
2	Research On Active Stepwise Pruning Method Of Deep Convolution Network Model
3	Research On Deep Neural Network Model Compression Method Based On Parameter Pruning
4	Research On Structured Pruning Algorithm Of Convolution Neural Networks
5	Research On Fine-Grained Model Pruning Algorithm
6	Research On Evolutionary Based Automated Neural Network Compression
7	Research And Application Of Model Compression Algorithm Based On Pruning-quantization-knowledge Distillation
8	Research On Multi-grained Pruning Algorithm Of Convolutional Neural Networks
9	Research On Computational Optimization Technology Based On Deep Learning
10	Research On Compression And Acceleration Of Deep Neural Network Based On Model Pruning