Research On Model Compression And Acceleration For Convolutional Neural Network Under Resource-constrained Scenarios

Posted on:2021-04-16

Degree:Master

Type:Thesis

Country:China

Candidate:J Liu

Full Text:PDF

GTID:2428330611967018

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

In recent years,convolutional neural networks(CNNs)have achieved great success in image classification,object detection,face detection,face recognition,etc.However,existing networks contain a large number of parameters and high computational costs,which make it hard to be deployed on resources-constrained devices such as mobile phones,drones,and AR glasses.Therefore,it is important to study model compression and acceleration for CNNs.In this paper,we focus on network quantization.By converting the full-precision weights and activations into low-precision ones,the memory consumption of the network can be greatly reduced.In the meanwhile,the floating-point multiplication in CNNs can be replaced by fixedpoint multiplication,which greatly reduces the computational overhead of the network.However,existing network quantization and training methods still have the following issues:(1)training a low-precision network is challenge,resulting in significant performance drop of the quantized network;(2)the weights and activations of the network obey non-uniform distribution.It is difficult for the existing uniform quantization methods to fit these data;(3)selecting a proper learning rate for CNNs during training is difficult.In this paper,we propose three methods to solve these issues.Specifically,for the issue(1),we propose a knowledge-based network quantization method to tackle the problem of training a deep convolutional neural network with both low-precision weights and low-bitwidth activations.To this end,we first introduce Kullback Leibler(KL)divergence to measure the difference between the output distributions of the full-precision model and the low-precision model.Then,we jointly train the two networks.In this way,the full-precision model provides hints to guide the training of low-precision model and reduce the performance drop.For the issue(2),we propose a non-uniform quantization method to fit the non-uniform distribution of weights and activations.To this end,we first replace the discretizer of quantization with trainable discretizer and then train the quantizer and network in an end-to-end manner,which greatly improves the performance of the quantized network.For the issue(3),we propose a line search-based optimization method.This method uses line search to automatically search for an appropriate learning rate,so that the network can converge to a better local minima and improve the performance of the model.Extensive experimental results demonstrate that our proposed methods can effectively improve the performance of the networks.

Keywords/Search Tags:

Model Compression and Acceleration, Network Quantization, Line Search

PDF Full Text Request

Related items

1	Research On Application Of Neural Network Compression And Acceleration Based On Quantization
2	Research On Compression And Acceleration For Deep Convolutional Network Model
3	Research On Model Compression And Acceleration For Deep Neural Network
4	Research On Compression And Acceleration Of Deep Convolutional Neural Networks
5	The Acceleration And Compression Of Convolutional Neural Networks
6	Research On Weight-Interaction Quantization Of Deep Learning Models
7	Research On Convolutional Neural Network Model Compression Based On Pruning
8	Research On Model Compression And Acceleration Based On Network Growth Method
9	Research On Deep Neural Network Model Compression Method Based On Parameter Pruning
10	Research On Accelerating Algorithm Of Neural Network Based On Quantization