Deep Neural Network Compression Method Based On Product Quantization

Posted on:2021-04-17

Degree:Master

Type:Thesis

Country:China

Candidate:X Q Fang

Full Text:PDF

GTID:2428330611953493

Subject:Control engineering

Abstract/Summary:

PDF Full Text Request

In recent years,deep neural networks have made outstanding achievements in various fields such as computer vision,natural language processing,and text data.But the problem is that with the deepening and over-parameterization of the deep neural network,its network size,calculation,power consumption,and storage are all exponentially increased,which ultimately leads to lower calculation efficiency of the super large architecture.Especially for applications with limited computing and capability resources,such as Web services,mobile and embedded devices.In order to solve the problem that the deep learning architecture is computationally expensive and takes up a lot of memory,it cannot be deployed on devices with low memory resources(such as mobile phones)or mobile terminals.This paper proposed a method for deep compression of a variety of different deep neural networks based on the combination of pruning and product quantization,which solved the problems of too many redundant parameters and large model storage in the deep neural network.In this paper,the first step was to build and train the network to save the best parameter model.The second step was to use pruning method to cut the weight connections in the network weight matrix below the determined threshold by threshold scanning,it can reduce redundant parameters in the deep neural network,then retrained the network to compensate for the lost performance,and then used product quantization clustering to achieve weight sharing,sacrificed a small amount of accuracy,so that the parameters of the neural network model quantized to 8 bits,which reduced storage overhead and improved the network's compression ratio and acceleration ratio.In the GPU-based Pytorch framework environment,the proposed method in this paper compressed LeNet-5,MLP,AlexNet,ResNet and VggNet-16 network models to 23 to 59 times under dataset experiments such as Mnist or CIFAR-10 without loss of accuracy.At the same time,it achieved an acceleration ratio of 1.52 to 3.3 times,Which improved the compression quantization was more than twice as fast as the Kmeans quantization.The experimental results showed that the compression ratio and acceleration ratio of convolutional neural networks were greatly improved by means of product quantization combined with pruning,which maked it possible to apply deep neural networks to embedded platforms.

Keywords/Search Tags:

CNN, network compression, pruning, product quantization, vector quantization

PDF Full Text Request

Related items

1	Study On Convolutional Neural Network Compression Methods Based On Pruning And Quantization
2	Research On Deep Neural Network Model Compression Method Based On Parameter Pruning
3	Study On High Ratio Compression Algorithms Of SAR Raw Data
4	Study On Quantization Methods Of Physical Layer Secret Key Generation
5	Research On Vector Quantization Applications In Video Compression
6	Study On The Algorithm Of SAR Raw Data Compression
7	Study Of Image Compression Based On Vector Quantization
8	Research On Vector Quantization Technology For Image Compression
9	Research Of Vector Quantization On Image Compression
10	Research And Application Of Structured Model Compression Algorithm In Deep Neural Network