Quantization Algorithm Based On Progressive Optimization And Distribution-aware Analysis

Posted on:2022-05-09

Degree:Master

Type:Thesis

Country:China

Candidate:H X Li

Full Text:PDF

GTID:2568306323477464

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

The performance of convolutional neural networks(CNNs)is entangled with their heavy memory computation cost,which has become the bottleneck limiting the exploitation of their full potential.Therefore,compressing and accelerating CNNs has received ever-increasing research focus.This thesis analyzes quantization errors and proposes optimization methods.The innovations are summarized as follows:(1)Automatic Progressive Mixed-Precision Network Quantization.Existing mixed-precision quantization approaches most focus on the search algorithm.However,the large search space restricts the efficiency and efficacy of the algorithm.The performance estimator is also computationally intensive.From the perspective of progressive optimization,this thesis analyzes the quantization clipping error and rounding error.For a given model,the quantization clipping error is a constant.The quantization rounding error is a function of precision.Based on this,we propose a progressive quantization algorithm.Furthermore,this thesis proposes a Hessian-Aware indicator.The second momentum from Adam optimizer is introduced as the proxy information of Hessian,which reduces the computational complexity of Hessian.This method obtains a mixed-precision model that satisfies the hardware constraints in an end-to-end manner.Mathematical derivation and comparative experiments have proved the effectiveness.(2)Distribution-Aware Quantization via Parameterized Max Scale.The discrete activation distribution of CNNs is not friendly to quantization.This thesis analyses on the activation distribution from the perspective of hyper-parameter optimization.The gradient information is used to perceive the upper bound of the dynamic quantization range adaptively.We propose a hardware-friendly quantization algorithm with a learnable max scale.Moreover,to improve the performance of the quantized model,this thesis proposes a structured knowledge transfer loss,which transfers the structured knowledge from the full-precision model to the low-bit one.It enhances the learning ability of spatial correlation and improves the information flow between multiple network layers.Its performance exceeds the other methods.

Keywords/Search Tags:

Deep Neural Networks, Compression and Acceleration, Model Quantization, Progressive Optimization, Distribution-aware

PDF Full Text Request

Related items

1	Bit-Quantization Based Method For Speeding Up Deep Neural Networks
2	Simplification Of Deep Models:Storage Compression And Computational Acceleration
3	Research And Application Of Neural Network Quantization Aware Training Methods
4	Model Compression And Forward Acceleration Based On Embedded Deep Neural Network
5	Research On Application Of Neural Network Compression And Acceleration Based On Quantization
6	Research On Compression And Acceleration For Deep Convolutional Network Model
7	Compression And Acceleration Of Vision Algorithm Model Based On Convolutional Neural Networks
8	Research On Deep Neural Networks Compression And Acceleration
9	Research On Model Compression And Acceleration For Deep Neural Network
10	Research On Exponential Quantization Compression Of Deep Neural Networks