Font Size: a A A

Research On Compression Methods For Deep Convolutional Neural Networks

Posted on:2022-08-23Degree:MasterType:Thesis
Country:ChinaCandidate:B H ZhuFull Text:PDF
GTID:2518306605489464Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
At this stage,deep convolutional neural networks have shown strong performance on many computer vision tasks.The constantly piled number of network layers provides the model with powerful feature processing capabilities,but this also makes the deep convolutional neural network model suffer from problems such as excessive storage memory and excessive calculation,which is not conducive to its deployment on lightweight devices.Therefore,the issue of how to compress deep convolutional neural networks has attracted the attention of academia.Although the research on neural network model compression has made some progress,there are still many problems that need to be solved.For example,most pruning methods empirically choose the L-p norm as a measure of the importance of the convolution kernel,and ignore the distribution of the L-p norm of the convolution kernel on the same convolution layer.This method defaults that the distribution of the L-p norm of the convolution kernel is relatively scattered,and redundant convolution kernels can be selected for pruning.However,in most deep convolutional neural networks,the distribution of the Lp norm of the convolution kernel on the same convolution layer is relatively concentrated,which is not conducive to the redundant convolution kernel Recognition.For another example,in terms of Knowledge Distillation,the addition of the attention mechanism does improve the effect of knowledge transfer,but the traditional attention map generation method uses all feature map channels.This method defaults that all feature map channel pairs has made equal contributions to the model.In fact,the data shows that there are obvious differences in the feature intensity between the feature map channels,and the attention map generated in this way cannot fully utilize the advantages of the attention mechanism.Starting from the above problems,this thesis studies the deep convolutional neural network compression problem from two aspects of pruning and knowledge distillation,and proposes corresponding solutions.The specific contributions are as follows:(1)We propose a feature map dispersion pruning algorithm,which measures the importance of the convolution kernel through the dispersion of multiple feature map channels generated by a single convolution kernel on a batch of input images,avoiding designing the importance measurement standard on the convolution kernel parameters with little overall discrimination.Compared with the pruning algorithm designing importance standard on convolution kernel,the feature map dispersion pruning algorithm can use the dispersion of the same channel feature map generated by the convolution kernel to judge the convolution kernel's ability to extract key features of the input data,and then measure the importance of the convolution kernel.Moreover,designing the importance of the convolution kernel from the feature map can use the distribution attribute of the feature map to extend the distance of the importance of the convolution kernel,and locate the redundant convolution kernel more accurately.In addition,the algorithm does not need to introduce redundant hyperparameters,reduces labor costs,and requires only a small amount of calculation,so it can be easily applied to any mainstream deep convolutional neural network model,and has universal applicability.At the same time,in order to reduce the impact of pruning operations on the accuracy of the model,we uses a cyclic pruning framework to smooth the change speed of the model structure.The experimental results on the image classification data set show that the feature map dispersion pruning algorithm performs better than the existing pruning algorithms on most models,and the model accuracy loss is the smallest under the same pruning rate,especially After pruning the 30% convolution kernel on the Resnet110 model on the cifar-10 data set,there is only 0.09% accuracy loss.(2)We propose an enhanced attention knowledge distillation algorithm,abandoning the feature map channel with weaker feature intensity in the teacher network,and only selects the guiding feature map channel in the middle layer of the teacher network to generate an enhanced attention map to assist the student network in training,that is,only The excellent knowledge in the teacher network is transferred to the students for online learning,which enhances the effect of knowledge distillation.And this article uses Early Stop technology to reduce the weight of the loss function of the later knowledge distillation part in the overall loss function,and further enhance the effect of the enhanced attention knowledge distillation algorithm.The experimental results on the image classification data set show that compared with the existing knowledge distillation algorithm,the enhanced attention knowledge distillation algorithm combined with the Early Stop training method can bring more performance improvements to the student network.
Keywords/Search Tags:Deep convolutional neural network, feature map, pruning, knowledge distillation, feature intensity
PDF Full Text Request
Related items