Mixed-precision Quantization Methods For Convolutional Neural Network Compression

Posted on:2021-01-22

Degree:Master

Type:Thesis

Country:China

Candidate:Y K Bao

Full Text:PDF

GTID:2518306503480274

Subject:Electronics and Communications Engineering

Abstract/Summary:

PDF Full Text Request

With the successful application of deep convolutional neural networks(DNNs)in various computer vision tasks,researchers design deeper or wider networks to surpass existing classic methods and achieve better performance.Most state-of-the-art convolutional networks need tens of megabytes of weight storage and billions of floating-point operations to perform a forward inference,which makes them difficult to deploy widely in resource constraint edge devices.Quantization is considered as one of the most effective ways to meet the memory requirements of edge devices.Quantization reduces the model size by replacing 32-bit floating-point numbers in weights,activations and gradients with lower bit-width representations.However,most quantization methods assign universal bit-width to all network layers.When compressing deep neural networks to very low precision,some sensitive layers may severely reduce network accuracy.Therefore,a better strategy is to adopt a heterogeneous bit-width allocation scheme.This research topic is also known as mixed-precision quantization.Related works in the area of mixed precision quantization have many disadvantages,such as high complexity and uncertainty in bit-width allocation.Our research focuses on analysis of local quantization noise,then connects layer importance with dynamic quantization sensitivity.Besides,based on the premise of equivalence between quantization noise and tiny perturbation near local equilibrium point,our method assigns bit-width by successively decreasing representation precision of each layer,so that the final bit-width allocation scheme is unique.We prove that feature maps amplify small quantization perturbations of weights,and that network accuracy degradation is directly due to the difference in feature maps between layers.We propose that layer-wise quantization should be in aims to reconstruct the feature map and adjust the quantization centroids obtained by the traditional quantizer.We make approximate estimates of the quantized feature map errors and iteratively optimize them by the alternating direction multiplier method.Based on the analysis of single-layer quantization noise,we propose quantization sensitivity measurement under small perturbation.The lower the quantization sensitivity,the higher the quantization priority.The entire weight bit-width allocation algorithm is mainly based on the stepwise precision reduction under guidance of �feature map alignment�.Once the compression ratio is reached,the bit-width allocation process stops.At the same time,the bit-width allocation scheme under any lower compression ratio condition can be derived according to the history log recorded.Given characteristic function of quantization error,we further propose a framework for activation bit-width allocation under constraint of weight precision.Experiments on mainstream neural networks show that our method achieves better result than related works.

Keywords/Search Tags:

Mobile multimedia, Compression, Quantization, Bitwidth scheme

PDF Full Text Request

Related items

1	Image Compression Scheme For Wireless Multimedia Sensor Networks Based On NMF
2	Video compression using vector quantization for multimedia applications
3	Research On SAR Raw Data Compression Algorithm Based On Imaging
4	Sparse Sampling Based Image Compression And Collection Scheme For Wireless Multimedia Sensor Networks
5	All-optical Spectral Quantization Scheme Based On Based On Cascaded Chalcogenide-silicon Slot
6	Deep Neural Network Compression Method Based On Product Quantization
7	Image Compression Of Lifting Scheme For Combining The Integer Wavelet Transform With The Vector Quantization
8	The Research Of Image Compression---a Still Image Classified Vector Quantization Scheme Based On Wavelet Transform
9	Research And Application Of Multimedia Compression System Based On DSP
10	Study On High Ratio Compression Algorithms Of SAR Raw Data