Font Size: a A A

Research On Exponential Quantization Compression Of Deep Neural Networks

Posted on:2020-11-11Degree:MasterType:Thesis
Country:ChinaCandidate:C RaoFull Text:PDF
GTID:2428330578473894Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
In recent years,Deep Neural Networks(DNNs)have emerged from many machine learning methods,which has aroused people's wide interest and attention.Compared to traditional machine learning algorithms,deep neural networks have massive parameters and complex network connections,which allows deep learning models to adaptively learn feature representations,so in some common computer vision tasks(eg image classification,object detection),deep neural networks tend to perform better than traditional machine learning algorithms.However,the redundant parameters and network connections in the deep neural network also make the deep learning model consume huge storage resources and computing resources,which makes some resource-constrained platforms unable to effectively run the deep learning model,causing the phenomenon of stagnation and fever.It may even damage the device,which makes deep neural networks difficult to widely apply to mobile embedded devices such as mobile phones.In order to solve the problem of deep neural network deployment in resource-constrained platforms,compression of depth models is a common method,mainly including model pruning,knowledge refinement,low rank decomposition,refined model structure design and model quantification..Some of these methods are cumbersome to implement,and a large number of hyperparameters need to be adjusted.Some methods also change the original network structure.More importantly,these algorithms often cause synthetic performance loss.The optimization goal of model compression is to greatly reduce the storage capacity of the model without reducing the performance of the model or reducing the performance of the model,so that the deep model can be efficiently deployed to some resource-constrained platforms.In order to facilitate the shift operation of the embedded system,this paper quantizes the deep neural network by exponential quantization,and the weight in the deep neural model is reduced to 2 of n(n is an integer)power.The compression method is simple,the model structure of the network is not changed during the compression process,and the model can converge normally.When the model is saved,only the index corresponding to the codebook and the weight is saved,which effectively reduces the storage space of the deep learning model and does not affect the performance of the model.The main work of this paper has the following points:1)By summarizing the existing quantitative compression methods,this paper selects exponential quantization as the quantification method of this paper,and through analysis and deduction,it proves that the larger the absolute value of the weight and the larger the quantization error,the larger the quantization error is.Index optimisation strategy.2)The deep model is compressed by exponential quantization coding,and the codebook of the exponential quantization compression model is obtained by the full-precision model,so that the weights in the model are updated in the codebook and finally converged.The experiment found that exponential quantization compression can effectively approximate the initial full-precision model,and the size of the model is compressed to 1/10 of the original,with little impact on the performance of the model.3)Improve the exponential quantization compression,propose dynamic exponential quantization compression coding,update the model codebook while updating the model parameters,and make the codebook reduce the quantization error caused by the larger weight parameters as much as possible.Compared with exponential quantization compression,this compression method does not require a pre-training model as the initialization model,and the obtained compression model has a large improvement in both training speed and recognition accuracy.4)Various factors affecting the performance of exponential quantization are discussed.The effects of different regularizations and codebooks on model quantification are compared,and these factors are analyzed.The effectiveness of the method is further illustrated by a large number of comparative experiments.The method proposed in this paper can achieve precision and lossless compression on the model,and has significant effects on many models,which has strong practicability.
Keywords/Search Tags:deep neural networks, resource-constrained platform, exponential quantization compression, dynamic exponential quantization compression
PDF Full Text Request
Related items