The performance of convolutional neural networks(CNNs)is entangled with their heavy memory computation cost,which has become the bottleneck limiting the exploitation of their full potential.Therefore,compressing and accelerating CNNs has received ever-increasing research focus.This thesis analyzes quantization errors and proposes optimization methods.The innovations are summarized as follows:(1)Automatic Progressive Mixed-Precision Network Quantization.Existing mixed-precision quantization approaches most focus on the search algorithm.However,the large search space restricts the efficiency and efficacy of the algorithm.The performance estimator is also computationally intensive.From the perspective of progressive optimization,this thesis analyzes the quantization clipping error and rounding error.For a given model,the quantization clipping error is a constant.The quantization rounding error is a function of precision.Based on this,we propose a progressive quantization algorithm.Furthermore,this thesis proposes a Hessian-Aware indicator.The second momentum from Adam optimizer is introduced as the proxy information of Hessian,which reduces the computational complexity of Hessian.This method obtains a mixed-precision model that satisfies the hardware constraints in an end-to-end manner.Mathematical derivation and comparative experiments have proved the effectiveness.(2)Distribution-Aware Quantization via Parameterized Max Scale.The discrete activation distribution of CNNs is not friendly to quantization.This thesis analyses on the activation distribution from the perspective of hyper-parameter optimization.The gradient information is used to perceive the upper bound of the dynamic quantization range adaptively.We propose a hardware-friendly quantization algorithm with a learnable max scale.Moreover,to improve the performance of the quantized model,this thesis proposes a structured knowledge transfer loss,which transfers the structured knowledge from the full-precision model to the low-bit one.It enhances the learning ability of spatial correlation and improves the information flow between multiple network layers.Its performance exceeds the other methods. |