Font Size: a A A

Study Of Mixed Precision Quantization Of Convolution Neural Network

Posted on:2023-08-13Degree:MasterType:Thesis
Country:ChinaCandidate:J H LvFull Text:PDF
GTID:2558306845498984Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Convolutional neural network has been successfully used in image recognition,object detection and semantic segmentation.The improvement of network performance is usually accompanied by the increase of network depth,and the amount of parameters and calculations also increases.Deploying neural networks on hardware devices with limited resources requires network compression.As the mainstream scheme of network compression,neural network model quantization has good compression and acceleration effect,but there is a problem of precision loss before and after quantization.The existing methods aim to achieve higher compression ratio and improve the performance of quantization networks.At present,most quantization methods allocate the same bit-width to each layer of the network,and ultra-low bit-width quantization will lead to a significant decline in the accuracy of the model.According to the different contribution of different layers to the network,this paper assigns different quantization bit-widths to each layer of the network,and designs and implements a mixed precision quantization algorithm that can weigh the accuracy and complexity.The main contributions of this paper are as follows:(1)Aiming at the problem of serious accuracy loss in low bit-width quantization,this paper proposes a model quantization method based on quantization-aware-assisted learning.The quantization network is optimized by stages.Firstly,the quantization network is guided to the convergence state through adaptive initialization;then the performance of the quantization network is enhanced by constructing a quantizationaware knowledge distillation network for auxiliary training;Further,feature attention assistance training is added to help optimize the quantization network with the information of the intermediate feature map.The distance between the prediction results and the intermediate feature maps of the quantization network and the full-precision network is approximated by quantization-aware training,which improves the classification performance of the low-bit-width quantization network.(2)Aiming at the problem that different layers of quantization have different effects on the efficiency and accuracy of the whole network,this paper proposes a mixed precision quantization method based on differentiable search,which automatically searches the optimal bit-width of each layer for weight and activation.When searching the network only with the classification loss constraint,each layer of the network tends to choose a larger bit-width,so the model complexity constraint is increased to achieve a trade-off between complexity and accuracy;at the same time,in order to improve the performance of the search network,using the salient feature of full precision network to constrain the search process;finally,the mixed precision quantization fine-tuning is performed using the searched bit-width combinations.The experimental results show that the mixed precision quantization network can achieve higher network compression under the condition of maintaining the classification accuracy.(3)This paper proposes a mixed precision search quantization method based on task consistency,aiming at the problem that the optimization goal may be inconsistent between the mixed precision search task and the quantization task.The key parameters in the search and quantization are jointly optimized.Firstly,the shiftable parameterized activation function is used to replace the traditional activation function,and the dynamic activation function is used to match the different bit-width configurations of each layer to optimize the distribution of the mixed precision quantization network;secondly,the layer fusion strategy in training is proposed to solve the influence of layer fusion on accuracy.Experimental results show that the task consistency of search and quantization improves the performance of mixed precision quantization networks.
Keywords/Search Tags:Convolutional Neural Network, Mixed Precision Quantization, Image Recognition, Model Compression, Quantization-Aware Training
PDF Full Text Request
Related items