Font Size: a A A

Research On Deep Learning Model Quantization And Related Compression Technologies

Posted on:2021-09-16Degree:MasterType:Thesis
Country:ChinaCandidate:Y L ZhouFull Text:PDF
GTID:2518306503972639Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
With the fast development of deep convolutional neural networks,image classification,object detection and semantic segmentation have achieved great breakthroughs in recent years.While at the same time,the amount of parameters and calculations required by CNNs is also increasing,making it a great challenge to deploy the networks on resource-constrained heardwares.Therefore,it is necessary to study model compression algorithms to compress the existing convolutional CNN models,and to minimize the memory usage and calculation amount of the model.We conduct a study on CNN model compression and acceleration algorithms.We analyze existing model compression methods and complete two works in model quantization tasks.Firstly we quantize CNN parameters to fixed-point numbers to compress model size and improve efficiency.According to the parameter distributions of CNN models,we propose a quantization method based on scale factor estimation.Further on we use this method to quantize ICNet,a realtime semantic segmentation network.We utilize channel-wise weight quantization method and progressive quantization strategy to reduce accuracy loss.Finally our 4-bit quantized ICNet has a 4% accuracy loss,but the model size is reduced by 8 times.Secondly,we study the deployment and implementation of 8-bit networks.For pavement detection tasks,we use TensorRT Inference Accelerator to optimize the network and deploy on GPU devices.Aiming at the problem that TensorRT cannot realize our quantization method,we implement INT8 forward inference based on CUDA programming and cuDNN neural network acceleration library.Experimental results show that compared with full-precision networks,our framework can double the inference speed,while the accuracy loss of the quantized network is only about 0.3% lower than that of the original network.
Keywords/Search Tags:Deep Learning, Convolutional Neural Networks, Model Quantization, Semantic Segmentation
PDF Full Text Request
Related items