Font Size: a A A

Model Compression Technique For Deep Neural Network

Posted on:2020-12-04Degree:MasterType:Thesis
Country:ChinaCandidate:J ZhongFull Text:PDF
GTID:2428330626464651Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the promotion of large-scale dataset and the increase of computing power,Convolutional Neural Network(CNN)has achieved great success in face recognition,object detection,tracking and image segmentation.In order to improve the performance,the network structures are designed by the academic deeper and wider,with larger capacity and higher computational complexity.The large-scale network models can run smoothly in the high-frequency computer,but they are hard to be deployed to mobile devices limited by computing resources and power consumption,such as smartphones or embedded devices.In this context and demand,compressing CNN models has attracted considerable attention,where pruning CNN filters has generated great research popularity due to its high compression rate and acceleration ability.By this method,large models can be compressed into lightweight models with approximate performance,and transplanted into mobile devices,so that the academic products can realize greater value in the industry.This paper proposes corresponding model compression algorithms based on pruning from two different perspectives.This paper argues and demonstrates that where to prune is a critical issue in the pruning task,and proposes a filter pruning method based on pruning position learning.Considering the hierarchical structure of CNNs,long short-term memory(LSTM)is employed as an evaluation model to find the least important layer and generate the pruning decision.Firstly,the neural network is transformed into a string representation and fed into LSTM to generate the decision of whether to prune each layer or not.Then a channel-based method is adopted to evaluate the importance of each filter in the chosen layer and some unimportant filters are pruned combined with the recovery mechanism,which recovers the loss of precision preliminarily caused by pruning.LSTM is updated in the policy gradient method with both performance and complexity as the reward.In order to solve the problem of performance degradation caused bypruning task,an adaptive filter pruning method based on attention mechanism is proposed,that is,Squeeze-Excitation-Pruning(SEP)method.SEP module focuses on feature channel dimension,which is utilized to reconstruct the baseline model.The SEP operation is performed in the previous convolutional layer,and it generates an importance weight vector to scale the feature map in the next convolutional layer,then it sets some low importance weight to zero.SEP is a data-dependent adaptive filter pruning method.When it comes to different image data,different filters are soft pruned according to the SEP selection,that is,these convolution operations are ignored.In this paper,detailed and comprehensive experiments are carried out.In order to verify the universality of the algorithm,three different network structures: VGG19,Res Net56 and fully connected network model are experimented on three benchmark datasets: Cifar-10,Cifar-100 and MNIST.The experimental results are compared and analyzed with some existed algorithms.The results show that the proposed pruning methods are capable of compressing a variety of network structures largely with comparable accuracy,which outperform other state-of-the-art methods.
Keywords/Search Tags:Model Compression, Pruning, Reinforcement Learning, Attention Mechanism
PDF Full Text Request
Related items