Font Size: a A A

Research On Weight Compression Method Of CNN Based On Integer Coefficient Representation

Posted on:2020-02-21Degree:MasterType:Thesis
Country:ChinaCandidate:K MengFull Text:PDF
GTID:2428330620956210Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
Convolutional neural network(CNN)model compression is an effective method to reduce the parameter redundancy and storage space of CNN.Based on the CNN,this paper focuses on the compression methods based on weight pruning and weight quantization.In Chapter 1,it introduces the research background and status quo of CNN compression methods.Secondly,the content and the structure of this paper are introduced.In Chapter 2,the basic theory of CNN is introduced.Firstly,the composition of CNN is introduced,including convolution layer,full connection layer,activation function,pooling layer and SoftMax.Secondly,some optimization methods of CNN are introduced,including gradient descent,error back propagation,parameter initialization,batch normalization,and avoiding under-fitting and over-fitting.Then this paper introduces the common models of CNN: LeNet-5,AlexNet,VGNet and ResNet.Finally,this paper introduces the training framework of neural network: TensorFlow,Caffe and Keras.In Chapter 3,several classical CNN compression methods are analyzed,including weight quantization,model pruning and model design.Firstly,BinaryConnect,a CNN compression method based on binary quantization,is analyzed.Secondly,two compression methods based on ternary quantization,TWN and TTQ,are introduced.This paper also analyses a compression method based on weight 8-bit quantization,and quantifies ResNet and LeNet-5 with this method.This method compresses the model four times,while the decrease of model accuracy is less than 1%.The above several CNN compression methods are compared in this paper.Then this paper introduces channel pruning based on Taylor formula and gamma coefficient.At last,this paper introduces a compression method based on model design: MobileNet.In Chapter 4,firstly,the distribution of the weight of CNN and the format of computer storage numerial number are studied.According to the distribution of the weight and the characteristics of floating and fixed points,a compression method of CNN based on Integer Coefficient Representation(ICR)is proposed.Eight-bit integer coefficient fixed points are used instead of 32-bit floating points to save weights.In this paper,the updating strategy of weights is modified.Then,this paper compares and analyses the affect of step-by-step iteration strategy for integer coefficients representation on the results.This paper also compares the effects of different weight selection strategies and regularization strategies on the results.Then we use step-by-step iteration,the maximum absolute value first and L2 regularization term to quantify the CNN.The ICR algorithm proposed in this paper compresses the storage space of CNN model by four times.At the same time,the accuracy on ResNet is 0.27% higher than that on the original network,and LeNet-5 is 0.14%.Finally,ICR algorithm is compared with other quantization methods.In Chapter 5,several classical pruning methods are introduced firstly,and then the weight distribution characteristics of sparse networks obtained after pruning are analyzed.Then,a CNN compression method is proposed,which combines pruning algorithm with integral coefficient epresentation.When quantifying sparse CNNs,the neural networks can dynamically restore some important pruned connections,which avoiding possible performance degradation.Without considering the location information of sparse network weight,ICR compresses ResNet about 12 times,LeNet-5 about 78 times,and the accuracy of the model does not decrease.
Keywords/Search Tags:CNN, Deep learning, Model compression, Integral coefficient representation, Parameter pruning
PDF Full Text Request
Related items