Research On Parameter-exchanging Optimizing Mechanism In Distributed Deep Learning

Posted on:2016-11-24

Degree:Master

Type:Thesis

Country:China

Candidate:S Y Wang

Full Text:PDF

GTID:2348330479953410

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Deep learning has become more and more important in cutting-edge applications. With the input dataset and neural network size growing, single machine shows the weakness of memory shortage and huge time consuming for deep learning. Therefore leveraging distributed system is considered as a valuable implementation practice for deep learning. Among those distributed models, parameter server framework generally applies parallel Stochastic Gradient Descent for training.A parameter server framework does effectively improve training speed. However when facing the huge neural network size, exchanging parameters between worker nodes and parameter server node, will waste too much parallel training time and yield a serious bottleneck. Recent efforts rely on manually optimizing the parameter-exchanging interval to reduce communication overhead. As the parameter-exchanging interval is fixed, parameter-exchanging requests from different worker nodes will arrive almost at the same time, which causes a request burst on parameter server. As a result, the parameter server has to queue those requests and the bottleneck still exists.Toward the above problems, an approach and a system of automatically setting the optimal parameter-exchanging interval to remove the parameter-exchanging bottleneck while maintaining training accuracy is proposed. Moreover, based on the data size of parameters that can fit into the memory, parallel Stochastic Gradient Descent algorithm is optimized. The working mechanism is: Via monitoring the resource usage of parameter server node, automatically choose different optimal interval within limits on each training node to avoid request burst and force exchanging when the interval reaches limitation, in order to remove the parameter-exchanging bottleneck while maintaining accuracy.The evaluation shows the approach has successfully sped up parameter-exchanging process by eight times while maintaining accuracy. It has effectivily removed the parameter-exchanging bottleneck and request burst, therefore increased training speed.

Keywords/Search Tags:

Stochastic Gradient Descent, Parameter-exchanging Bottleneck, Parameter Server, Deep Learning, Distributed System

PDF Full Text Request

Related items

1	Research On Distributed Stochastic Gradient Descent Algorithm
2	A Research Of Stochastic Gradient Descent Algorithm
3	A Research And Application On Stochastic Gradient Descent Algorithm In Distributed Cluster
4	Research On Improving The Convergence Performance Of Stochastic Gradient Descent In Distributed Machine Learning
5	Applied Research On Gradient Descent Algorithm In Deep Learning
6	The Reseach And Application Of Stochastic Gradient Descent And Dual Coordinate Descent Algorithm
7	Communication Optimization Of Distributed Deep Learning Based On Gradient Priority
8	Optimization Algorithms Of Neural Networks Weights Based On Stochastic Gradient Descent
9	Optimal Design And Implementation Of Distributed Deep Learning Training
10	Dynamic Regret Of Online Gradient Descent:Analyses And Applications