| With the widespread use of convolutional neural networks in various fields,the total number of parameters is growing exponentially,and the consequent high consumption of storage costs and huge increase in computational costs are seriously affecting their further use and deployment on embedded and mobile devices.Therefore,this paper studies the problem of convolutional neural network compression,and the main contents are as follows:(1)The concept of "edge point" was innovatively defined,and a convolutional neural network compression method based on network pruning and weight quantization was proposed.Caffe framework was used to test the compression effect of the method on Conv1 layer and Conv2 layer in Le Net network using Le Net network model on MNIST data set.The experimental results show that the Conv1 layer is compressed from 1.95 KB to 0.069 KB,about28 times.The Conv2 layer is compressed from 97.66 KB to 3.06 KB,about 32 times.The Conv1 and Conv2 layers are simultaneously compressed from 99.61 KB to 3.13 KB,about 32 times.The accuracy of the model remains basically unchanged.(2)Based on network pruning and weight quantization,Huffman coding technique is introduced to improve the proposed convolutional neural network compression method to further enhance the compression effect by reducing the storage space of non-zero weight indexes.Caffe framework is used to test the compression effect of the proposed method on Conv1 layer and Conv2 layer in CIFAR-10 network by using CIFAR-10 network model of Caffe framework.The experimental results show that the Conv1 layer is compressed to 0.481 KB from 9.3KB to0.507 KB,and the weight index bits are compressed from 3Bit to 2.72 Bit and the compression effect doubles from 18.3 to 19.3 times.The Conv2 layer is compressed to 3.66 KB from 100 KB to 3.85 KB,and the weight index bits are compressed from 4Bit to 3.16 Bit,and the compression effect is doubled from 26 to 27.3 times.The Conv1 and Conv2 layers are simultaneously compressed to 4.17 KB from 109.38 KB to 4.35 KB,and the compression effect is doubled from25.1 to 26.2 times.The model accuracy remains basically unchanged.To sum up,in order to reduce the number of network parameters and improve the efficiency of network computing,this paper proposes a convolutional neural network compression method,and carries out experimental demonstration and analysis of the results.The results show that the proposed method is feasible and effective for at least two convolutional neural network models. |