Deep Neural Networks Compression And Acceleration Based On Interpretable Analysis

Posted on:2021-07-03

Degree:Master

Type:Thesis

Country:China

Candidate:Y C Li

Full Text:PDF

GTID:2518306017473704

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

The remarkable performance of convolutional neural networks(CNNs)is entangled with their huge number of parameters and large computation time,which has become the bottleneck limiting the exploitation of their full potential.Therefore,compressing and accelerating convolutional neural networks(CNNs)has received ever-increasing research focus.However,most existing CNN compression and acceleration methods do not interpret their inherent structures to distinguish the implicit redundancy.To this end,in order to compress and accelerate the CNNs based on their interpretable redundancy,we first analyze their internal working mechanisms and then propose two compression and acceleration methods,which closer to the essence of neural networks.The specific research content and contributions are summarized as follows:(1)Exploiting kernels sparsity and entropy for CNN compression.The problem of CNN compression is investigated from a novel interpretable perspective and discover the sparsity and information richness are the key elements to evaluate the importance of the feature maps.After that,to accelerate the process of compression method,the relationship between the input feature maps and 2D kernels is revealed in a theoretical framework.Based on that,a kernel sparsity and entropy(KSE)indicator is proposed to quantitate the feature map importance in a feature-agnostic manner.Finally,a kernel clustering is employed to reduce the number of kernels corresponding to each input channels.Our method demonstrates superior performance gains over previous ones.(2)A network architecture decoupling method is proposed for CNN acceleration.Based on the internal working mechanisms of the network,which demonstrates the filters have different responses for different input images,a dynamic pruning method has been proposed to accelerate the network.First,an architecture controlling module is introduced and embedded into each layer to dynamically identify the activated filters.Then,by maximizing the mutual information between the architecture encoding vector and the input image,the network architecture is decoupled to accelerate the calculation process of each input.Meanwhile,to further improve the discrimination of the network and the time of inference,we limit the output of the convolutional layers and sparsifying the calculation path.Our method achieves the significantly real CPU running speed up and keeps similar performance to the original network.

Keywords/Search Tags:

Deep Neural Networks, Network Compression and Acceleration, Inter-pretable Analysis, Parameter Pruning, Kernel Clustering

PDF Full Text Request

Related items

1	Research On Deep Neural Networks Compression And Acceleration
2	Research On Deep Neural Network Model Compression Method Based On Parameter Pruning
3	Research On Compression Method Of Deep Neural Networks Model Based On Parameter Pruning And Sharing
4	Compress And Accelerate Deep Convolutional Neural Networks Via Group-based Pruning Methods
5	Research On Compression And Acceleration Of Deep Neural Network Based On Model Pruning
6	Development Of Model Compression And Inference Acceleration Algorithms Of Image Super Resolution Deep Neural Networks
7	Research On Compression And Acceleration Of Deep Convolutional Neural Networks
8	Acceleration,Compression And Evaluation Methods On Deep Neural Networks
9	Research On Deep Neural Networks Compression Based On Sensitivity Pruning Method
10	Deep Convolutional Neural Networks Pruning Research Based On Similarity