Font Size: a A A

Deep Neural Network Compression Method Based On Product Quantization

Posted on:2021-04-17Degree:MasterType:Thesis
Country:ChinaCandidate:X Q FangFull Text:PDF
GTID:2428330611953493Subject:Control engineering
Abstract/Summary:PDF Full Text Request
In recent years,deep neural networks have made outstanding achievements in various fields such as computer vision,natural language processing,and text data.But the problem is that with the deepening and over-parameterization of the deep neural network,its network size,calculation,power consumption,and storage are all exponentially increased,which ultimately leads to lower calculation efficiency of the super large architecture.Especially for applications with limited computing and capability resources,such as Web services,mobile and embedded devices.In order to solve the problem that the deep learning architecture is computationally expensive and takes up a lot of memory,it cannot be deployed on devices with low memory resources(such as mobile phones)or mobile terminals.This paper proposed a method for deep compression of a variety of different deep neural networks based on the combination of pruning and product quantization,which solved the problems of too many redundant parameters and large model storage in the deep neural network.In this paper,the first step was to build and train the network to save the best parameter model.The second step was to use pruning method to cut the weight connections in the network weight matrix below the determined threshold by threshold scanning,it can reduce redundant parameters in the deep neural network,then retrained the network to compensate for the lost performance,and then used product quantization clustering to achieve weight sharing,sacrificed a small amount of accuracy,so that the parameters of the neural network model quantized to 8 bits,which reduced storage overhead and improved the network's compression ratio and acceleration ratio.In the GPU-based Pytorch framework environment,the proposed method in this paper compressed LeNet-5,MLP,AlexNet,ResNet and VggNet-16 network models to 23 to 59 times under dataset experiments such as Mnist or CIFAR-10 without loss of accuracy.At the same time,it achieved an acceleration ratio of 1.52 to 3.3 times,Which improved the compression quantization was more than twice as fast as the Kmeans quantization.The experimental results showed that the compression ratio and acceleration ratio of convolutional neural networks were greatly improved by means of product quantization combined with pruning,which maked it possible to apply deep neural networks to embedded platforms.
Keywords/Search Tags:CNN, network compression, pruning, product quantization, vector quantization
PDF Full Text Request
Related items