Deep learning has become more and more important in cutting-edge applications. With the input dataset and neural network size growing, single machine shows the weakness of memory shortage and huge time consuming for deep learning. Therefore leveraging distributed system is considered as a valuable implementation practice for deep learning. Among those distributed models, parameter server framework generally applies parallel Stochastic Gradient Descent for training.A parameter server framework does effectively improve training speed. However when facing the huge neural network size, exchanging parameters between worker nodes and parameter server node, will waste too much parallel training time and yield a serious bottleneck. Recent efforts rely on manually optimizing the parameter-exchanging interval to reduce communication overhead. As the parameter-exchanging interval is fixed, parameter-exchanging requests from different worker nodes will arrive almost at the same time, which causes a request burst on parameter server. As a result, the parameter server has to queue those requests and the bottleneck still exists.Toward the above problems, an approach and a system of automatically setting the optimal parameter-exchanging interval to remove the parameter-exchanging bottleneck while maintaining training accuracy is proposed. Moreover, based on the data size of parameters that can fit into the memory, parallel Stochastic Gradient Descent algorithm is optimized. The working mechanism is: Via monitoring the resource usage of parameter server node, automatically choose different optimal interval within limits on each training node to avoid request burst and force exchanging when the interval reaches limitation, in order to remove the parameter-exchanging bottleneck while maintaining accuracy.The evaluation shows the approach has successfully sped up parameter-exchanging process by eight times while maintaining accuracy. It has effectivily removed the parameter-exchanging bottleneck and request burst, therefore increased training speed. |