Font Size: a A A

Deep Neural Networks Compression And Acceleration Based On Interpretable Analysis

Posted on:2021-07-03Degree:MasterType:Thesis
Country:ChinaCandidate:Y C LiFull Text:PDF
GTID:2518306017473704Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The remarkable performance of convolutional neural networks(CNNs)is entangled with their huge number of parameters and large computation time,which has become the bottleneck limiting the exploitation of their full potential.Therefore,compressing and accelerating convolutional neural networks(CNNs)has received ever-increasing research focus.However,most existing CNN compression and acceleration methods do not interpret their inherent structures to distinguish the implicit redundancy.To this end,in order to compress and accelerate the CNNs based on their interpretable redundancy,we first analyze their internal working mechanisms and then propose two compression and acceleration methods,which closer to the essence of neural networks.The specific research content and contributions are summarized as follows:(1)Exploiting kernels sparsity and entropy for CNN compression.The problem of CNN compression is investigated from a novel interpretable perspective and discover the sparsity and information richness are the key elements to evaluate the importance of the feature maps.After that,to accelerate the process of compression method,the relationship between the input feature maps and 2D kernels is revealed in a theoretical framework.Based on that,a kernel sparsity and entropy(KSE)indicator is proposed to quantitate the feature map importance in a feature-agnostic manner.Finally,a kernel clustering is employed to reduce the number of kernels corresponding to each input channels.Our method demonstrates superior performance gains over previous ones.(2)A network architecture decoupling method is proposed for CNN acceleration.Based on the internal working mechanisms of the network,which demonstrates the filters have different responses for different input images,a dynamic pruning method has been proposed to accelerate the network.First,an architecture controlling module is introduced and embedded into each layer to dynamically identify the activated filters.Then,by maximizing the mutual information between the architecture encoding vector and the input image,the network architecture is decoupled to accelerate the calculation process of each input.Meanwhile,to further improve the discrimination of the network and the time of inference,we limit the output of the convolutional layers and sparsifying the calculation path.Our method achieves the significantly real CPU running speed up and keeps similar performance to the original network.
Keywords/Search Tags:Deep Neural Networks, Network Compression and Acceleration, Inter-pretable Analysis, Parameter Pruning, Kernel Clustering
PDF Full Text Request
Related items