On The Depth And Big Model Of Deep Neural Networks: Theory And Algorithm

Posted on:2019-06-16

Degree:Doctor

Type:Dissertation

Country:China

Candidate:S Z Sun

Full Text:PDF

GTID:1368330599965129

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

In recent years,deep neural networks(DNN)has achieved great success in many applications.Actually,DNN did not become widely-used until 2006,although many technologies used in neural networks have been proposed in the 1990 s.Essentially,there are two kinds of driven force for the success of DNN after 2006,i.e.,the increasing depth and the growing model size.To successfully increase the depth,many techniques have been proposed,e.g.,auto-encoder,batch normalization,residual network,etc.Simultaneously,to efficiently handle the growing model size,parallel training frameworks have been proposed,such as data parallelism and model parallelism.However,if we want to move towards better deep learning,these techniques are far from enough.First,for the depth,although there are many techniques to increase the depth,an important question is how to understand the advantage and disadvantage of the depth from the theoretical view.Second,most of parallel algorithms are directly inherited from the convex problem.However,DNN is highly non-convex model.Therefore,a natural question is how to handle the non-convexity of DNN during the parallel training.Third,another difference between DNN and the traditional shallow models is that there are many redundant parameters in DNN,which will cause extremely high communication cost during the parallel training.Therefore,how to handle such redundancy during the parallel training is another challenge.To tackle these challenges,this thesis makes following investigations.First,we propose uniform upper bounds for the representation ability and the model capacity of DNN.Based on these bounds,we analyze the pros and cons of depth.And,based on the above theoretical analysis,we propose to improve the performance of DNN by maximizing the margin.Second,we proof that model average,which is used as model aggregation method during the data parallelism,cannot provide performance guarantee for the global model.Therefore,we propose to use ensemble as the model aggregation method,and design a new parallel training framework based on the ensemble.Third,we propose to regard the communication-efficient distributed deep learning as a multiagent system,and give concrete definitions for the actions,environments and utility.Based on such multi-agent system,we propose to use best response strategy to reduce the communication cost,i.e.,only transfer the non-redundant parameters(or gradients)during the communication.

Keywords/Search Tags:

deep learning, generalization, distributed machine learning, data parallelism

PDF Full Text Request

Related items

1	Optimization Of Distributed Training Strategies For Deep Learning Networks
2	Research On Efficient Distributed Parallel Algorithm Of Deep Learning Framework Tensorflow
3	Runtime Optimization For Large-Scale Neural-Network Data-Parallelism Training
4	Generalization Ability Of Support Vector Machine In The Environment Of Big Data
5	Algorithm And System For Distributed Deep Ensemble Learning And Architecture Search
6	Research On Multi-task Learning
7	Parallel And Distributed Training Of Deep Learning
8	Weighted Localized Generalization Error Extreme Learning Machine For Multiclass Imbalance Problems
9	Unlabeled Data Aided Deep Learning Techniques Researches
10	Weight Learning And Differential Evolutionary In Machine Learning