Font Size: a A A

Research On Model Compression And Acceleration For Convolutional Neural Network Under Resource-constrained Scenarios

Posted on:2021-04-16Degree:MasterType:Thesis
Country:ChinaCandidate:J LiuFull Text:PDF
GTID:2428330611967018Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In recent years,convolutional neural networks(CNNs)have achieved great success in image classification,object detection,face detection,face recognition,etc.However,existing networks contain a large number of parameters and high computational costs,which make it hard to be deployed on resources-constrained devices such as mobile phones,drones,and AR glasses.Therefore,it is important to study model compression and acceleration for CNNs.In this paper,we focus on network quantization.By converting the full-precision weights and activations into low-precision ones,the memory consumption of the network can be greatly reduced.In the meanwhile,the floating-point multiplication in CNNs can be replaced by fixedpoint multiplication,which greatly reduces the computational overhead of the network.However,existing network quantization and training methods still have the following issues:(1)training a low-precision network is challenge,resulting in significant performance drop of the quantized network;(2)the weights and activations of the network obey non-uniform distribution.It is difficult for the existing uniform quantization methods to fit these data;(3)selecting a proper learning rate for CNNs during training is difficult.In this paper,we propose three methods to solve these issues.Specifically,for the issue(1),we propose a knowledge-based network quantization method to tackle the problem of training a deep convolutional neural network with both low-precision weights and low-bitwidth activations.To this end,we first introduce Kullback Leibler(KL)divergence to measure the difference between the output distributions of the full-precision model and the low-precision model.Then,we jointly train the two networks.In this way,the full-precision model provides hints to guide the training of low-precision model and reduce the performance drop.For the issue(2),we propose a non-uniform quantization method to fit the non-uniform distribution of weights and activations.To this end,we first replace the discretizer of quantization with trainable discretizer and then train the quantizer and network in an end-to-end manner,which greatly improves the performance of the quantized network.For the issue(3),we propose a line search-based optimization method.This method uses line search to automatically search for an appropriate learning rate,so that the network can converge to a better local minima and improve the performance of the model.Extensive experimental results demonstrate that our proposed methods can effectively improve the performance of the networks.
Keywords/Search Tags:Model Compression and Acceleration, Network Quantization, Line Search
PDF Full Text Request
Related items