Font Size: a A A

Simplification Of Deep Models:Storage Compression And Computational Acceleration

Posted on:2019-11-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:G Y LiFull Text:PDF
GTID:1368330551456850Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Models based on deep neural networks(DNNs)always contain enormous number of parameters,which makes these models consume a lot of resources in terms of com-puting and storage.For some applications that require deep models,the platform on which they run may not provide the computational and storage resources required by the models.In this case,the models need to be simplified.The purpose of the simpli-fication of deep models is to accelerate the speed of the model calculation or compress the size of model storage while maintaining the model accuracy(different evaluation metrics for different applications).In this paper,we have established some research works on the acceleration and the compression of the deep models.Firstly,for deep neural networks which are important parts of deep models,we propose a general compression method that can greatly compress the storage size of the DNNs.For DNN compression,there is a general approach based on Magnitude-based Pruning(MP),which assumes that the absolute weight value of each connection repre-sents the importance of the connection.Given an pruning threshold,all connections in the neural network that are less important than the threshold are pruned.The Layer-wise Magnitude-based Pruning(LMP)method that has achieved remarkable performance in deep model compression is mainly a variant of MP.The practice of LMP is mainly to ap-ply MP on different layers of the deep network,and different layers use different pruning thresholds.However,tuning the pruning thresholds for LMP is very hard.The number of threshold combinations increases exponentially while the number of network layers increases.It is very difficult to determine a set of optimal thresholds for deep models with various structures.For this problem,we propose an optimization-based method,which is Optimization based Layer-wise Magnitude-based Pruning(OLMP).To solve this problem,OLMP transforms the neural network pruning problem into a constrained single-objective optimization problem and uses a derivative-free optimization method to solve it.Therefore,a set of pruning thresholds with a good pruning performance can be automatically generated.Secondly,we select three deep models for a specific application and investigate how to compress these model specifically.The specific application selected for this work is Machine Translation.The current practice of processing machine translation tasks using deep neural networks is collectively referred to as Neural Machine Transla-tion(NMT).The relevant deep models are also referred to as NMT models.The major cost of storage for NMT models come from the adopted DNNs.In the existing work of compressing DNNs in NMT models,LMP is also a commonly used method,but the connectivities of those DNNs are very complicated,and the existing works of apply-ing LMP do not take these special connectivities into account.This may restrict the effectiveness of LMP,since those sophisticated connectivities are not considered by the approach which groups connections by layers and then applies MP with individual pruning threshold on different groups.In response to this problem,three representative NMT models are selected in this work,and the effects of different connection group-ing strategies for pruning are studied in detail.Because the pruning performance of a model is not only related to the grouping strategy adopted,but also related to the pruning thresholds.In order to find a best possible pruned model after selecting the grouping strategy,this work extends OLMP to the situation of arbitrary grouping strategies.An pruned model can be automatically found by the extended method.Finally,we test the pruning performances of different connection grouping strategies,and find the appro-priate grouping strategies for the three representative models.Thirdly,we select a deep model for a specific application to investigate how to accelerate the calculation of the non-DNN part in deep models.The model selected for this work is the Region-based CNN(R-CNN)model for object detection.The R-CNN model contains two parts.The first part is to analyze the picture and segment the areas of the picture that may contain objects.These areas that may contain objects are called Region of Interests(Rols);the second is to use the Convolutional Neural Network(CNN)to classify RoIs to determine whether each region can be identified as a certain object.The problem of the R-CNN model in practical application is that,these does not exist an effective acceleration method for the RoI generation.This problem makes the process of RoI generation become one of the bottlenecks in time.This paper studies this problem and proposes the Relief R-CNN(R2-CNN)method,which greatly simplifies the calculation of Rol by extracting the RoI directly from the convolutional layers of CNN.Compared with other RoI generation methods which may cost 63.5%?98.7%calculation time in testing,the time of Rol generation in R2-CNN only consume 0.3%over the total time.
Keywords/Search Tags:Deep Learning, Deep Neural Network, Model Compression, Model Acceleration, Object Detection, Neural Machine Transliation, Optimization
PDF Full Text Request
Related items