Font Size: a A A

Deep Neural Network Compression And Acceleration

Posted on:2021-10-23Degree:MasterType:Thesis
Country:ChinaCandidate:J S LiFull Text:PDF
GTID:2518306308970089Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,deep neural networks(DNNs)is widely used in image classification,object detection,speech recognition and other fields.However,the huge storage and computing resource requirements of DNNs seriously hinder DNNs being widely deployed in resource constrained mobile devices and embedded devices.As two important branches in the field of network lightweighting,channel pruning and network quantization can significantly compress and accelerate deep neural networks.Many channel pruning works only consider and utilize the statistical information of the current layer when pruning channels of a certain layer in neural network.However,the correlations between the continuous layers are omitted.These separated channel pruning methods can not maximally remove the redundant channels in the neural network and also brings a certain loss of accuracy after channel pruning.In this paper,an Out-In-Channel pruning(OICP)algorithm is innovatively proposed to overcome the shortcomings of the separated channel pruning described above.OICP takes out-in-channel as the smallest pruning unit,and fully considers the correlations between continuous layers in the neural network when adding structural regularization constraints and selecting redundant channels in DNNs.In OICP,a global greedy pruning algorithm is designed to automatically determine the pruning ratio of each layer and remove redundant out-in-channels in an iterative way.The effectiveness of OICP has been fully verified on various DNNs architectures such as ResNet,DenseNet,PreSeNet,MobileNetv2.OICP achieves state-of-the-art compression results on ImageNet-1K image classification task.Using OICP to remove 37.2%of FLOPs of ResNet-50,the network accuracy after channel pruning is 0.22%higher than that of the original network;Using OICP to remove 50%of the FLOPs of ResNet-50 just leads to 0.37%Top-1 accuracy drop.Neural network quantization achieves compression and acceleration effects by replacing floating-point parameters and activation values in the original network with low-bit values.Many quantization methods quantize both floating-point parameters and activation values to low-bit values at once(One-Shot Quantization),which brings serious network turbulence,making it difficult for quantization networks to converge and achieve the accuracy of the original network.This paper proposes an incremental quantization algorithm to solve the shortcomings of the above quantization methods.The incremental quantization algorithm reduces the network fluctuations in the quantization process by iteratively quantifying network parameters and activation values.In each quantization iteration,only part of the network parameters and network weights are selected and quantized.In order to further reduce the fluctuation of the network,the weights and activation values quantized in each quantization iteration should be disjoint in the dimension of the out-channels.Experiments show that the incremental quantization algorithm can well solve the problem of difficult convergence of quantization network training,and can improve the accuracy of quantization network.Using the above incremental quantization algorithm to quantize ResNet-18 with 3bits values on the ImageNet-1K dataset,the network accuracy after incremental quantization is 2.03%higher than that of One-Shot quantization.
Keywords/Search Tags:Neural Network Lightweighting, Channel Pruning, Out-In-Channel, Incremental Quantization
PDF Full Text Request
Related items