Research On Deep Neural Networks Compression And Acceleration

Posted on:2020-03-26

Degree:Doctor

Type:Dissertation

Country:China

Candidate:S H Lin

Full Text:PDF

GTID:1488305723483774

Subject:Intelligent Science and Technology

Abstract/Summary:

PDF Full Text Request

In recent years,deep neural networks(DNNs)have developed rapidly and achieved re-markable success in many artificial intelligence(AI)applications,such as image under-standing,speech recognition and natural language processing,which have been one of the research focuses in AI.Moreover,they also have been widely used in industries including video surveillance,game playing,auxiliary medical and automatic driving,as they sig-nificantly improve the performance of multiple tasks in various areas.However,with the high performance improvement of DNNs,the networks have become deeper and wider,which significantly increases the number of parameters and computation complexity.On the other hand,with widely use of mobile or embedded devices(e.g.,mobile phones,drones and robots),DNNs are considered to be applied to these devices,whose functions will be further enhanced.This will play a huge role in helping both enemy exploration and disaster relief in the military level,and promoting both mobile intelligent identifica-tion and convenient travel in the civil level.However,the existing complex DNNs cannot be directly stored and run on the resource-limited devices.To this end,reducing the pa-rameter redundancy and improving the inference efficiency for DNNs compression and acceleration have become an effective solution.It will have great theoretical significance and application value.Aiming at the problem of parameter redundancy in DNNs,this dissertation delves into general method of low-rank decomposition and parameter pruning from two different tasks of DNNs compression and acceleration,especially for convolutional neural networks(CNNs)compression and acceleration.The specific research content and contributions are summarized as follows:(1)A closed low-rank decomposition and knowledge transfer based method is proposed for holistic CNN Compression.the speedup of convolutional layers and the memory reduction of fully-connected layers are treated as two separated tasks in conven-tional methods,which are unable to be simultaneously processed.Besides,these methods employ local compression in a layer-wise manner,which cannot explic-itly align the final accuracy of networks.To address these problems,this disser-tation proposes a holistic CNN compression framework to simultaneously acceler-ate and compress CNNs.First,A closed-form low-rank decomposition(LRD)is proposed to simultaneously speedup convolutional computation and reduce mem-ory consumption.To improve the accuracy of compressed networks and overcome the vanishing gradient during training,this dissertation presents a novel knowledge transfer(KT)method to align final outputs and intermediate responses from the compressed network and its original one.Evaluated on several public image clas-sification datasets with different models,the proposed method achieves the best trade-off between accuracy and compression/speedup rate.For example,compared to the original VGG-16,the proposed method achieves a factor of 41.92x compres-sion,2.33x GPU speedup,while only with 0.18%top-1 error increase.(2)A global and dynamic pruning based method is proposed for CNN acceleration.Al-though the network models can be compressed greatly by decomposing weights into several small matrices after low-rank decomposition,the number of frequent acqui-sition operations on data is greatly increased during model inference.Moreover,the number of output feature maps cannot be reduced,which leads to the increase of communication bandwidth in the model computation.The problem can be effec-tively addressed by structured pruning.However,the existing structured pruning schemes prune CNNs in a layer-by-layer fixed manner,which is less adaptive,less efficient,and less effective.To address these problems,this dissertation proposes a global&dynamic pruning(GDP)scheme to fast prune filters offline,and recall fil-ters that are mistakenly pruned to improve the accuracy of pruned network.Global mask is first introduced to be added after each filter,which determines the saliency score of individual filters.Besides,the global and dynamic objective function is constructed,and then solved by a greedy alternative updating strategy.This opti-mization process includes designing a Taylor expansion based global mask to tem-porally remove the filters with small salient scores,and a stochastic gradient descent method for updating filters.Compared to the state-of-the-art filter pruning meth-ods,the proposed method achieves the best trade-off between accuracy and speedup rate.For example,when pruning ResNet-50,the proposed method achieves a fac-tor of 2.45x FLOPs reduction and 1.93%CPU actual speedup,while with only an increase of 2.16%top-5 classification error.(3)A generative adversarial learning based method is proposed for optimal structured CNN pruning.There exists two issues on the lack of slackness and strong label dependency in the global and dynamic pruning method.To address these problems,this dissertation proposes a heterogeneous structures pruning method in a label-free and an end-to-end manner.This method introduces a soft mask with sparsity regularization to each model structure,which represents the redundancy of each structure.To learn more better model weights and masks,this dissertation construct a new structured pruning objective function,which is solved by label-free generative adversarial learning and fast iterative shrinkage-thresholding algorithm(FISTA),and then reliably remove the redundant structures.Quantitative experiment shows that the pruned ResNet-50 achieves a factor of 3.7�FLOPs reduction and 2.5�parameter compression with only an increase of 3.75%top-5 classification error.This significantly outperforms state-of-the-art methods.

Keywords/Search Tags:

Deep Neural Networks, Network Compression and Acceleration, Low-rank Decomposition, Parameter Pruning, Knowledge Distillation

PDF Full Text Request

Related items

1	Development Of Model Compression And Inference Acceleration Algorithms Of Image Super Resolution Deep Neural Networks
2	Research On Compression And Acceleration Of Deep Neural Network Based On Model Pruning
3	The Acceleration And Compression Of Convolutional Neural Networks
4	Deep Neural Networks Compression And Acceleration Based On Interpretable Analysis
5	Research On Model Compression And Acceleration For Deep Neural Network
6	Design And Implementation Of Exploratory Multidimensional Analysis And Visualization System For Big Data
7	Research And Application Of Model Compression Algorithm Based On Pruning-quantization-knowledge Distillation
8	Pruning Neural Networks Based On Stochastic Gradient Sparse Optimization
9	Research On Deep Neural Network Model Compression Method Based On Parameter Pruning
10	Research On Compression Method Of Deep Neural Networks Model Based On Parameter Pruning And Sharing