Deep learning has developed rapidly in recent years and has been used more and more widely.It has also achieved outstanding results in various fields.With the arrival of the era of big data,the amount of data has grown exponentially,and the deep learning model has become increasingly large.Using a single machine to train deep learning models has not been able to meet people’s needs.Distributed deep learning has become an important research direction.Many research teams and technology companies have improved deep learning algorithms from different perspectives,and have also summarized many experiences and methods of distributed training.However,in the actual training process,we find the following problems: First,the existing machine learning framework mainly provides a general machine learning algorithm library,but on certain specific issues,such as convolutional neural networks non-convex problems,was not specifically optimized.Therefore,when training a neural network in a distributed environment,the ideal acceleration effect is often not achieved,and the model cannot even achieve effective convergence.Second,in heterogeneous environment clusters,due to the differences in the performance of various machines,the stability and effectiveness of the distributed stochastic gradient descent algorithm have dropped seriously,and the actual operation results are always far from the expected results.Aiming at the above problems,this paper proposes a distributed asynchronous stochastic gradient descent algorithm based on parameter server architecture.The algorithm uses asynchronous protocol to synchronize the parameters of each worker,and improves the parameter updating mechanism in existing asynchronous algorithms.The experimental results show that the algorithm has achieved good results in dealing with nonconvex optimization problems such as image classification,and solves the problem of non-convergence of the asynchronous algorithm in a distributed environment.At the same time,it can achieve the same accuracy with stochastic gradient descent algorithms.Our algorithm can also increases the use of computing resources in the cluster.This paper analyzes heterogeneous environments and finds that the impact of highlatency update values on global parameters is the main reason for the decrease of algorithm operation efficiency.To solve this problem,this paper proposes a distributed DASGD algorithm.The algorithm labels the delay for each updated value.During the calculation of new global parameters,the high-delay updated value will be punished,reducing the value of the global value.The effect of the parameter.The experimental results show that the distributed DASGD algorithm has good stability in heterogeneous environments.In the experimental part,we build a convenient distributed experimental environment based on the browser,in order to validate and test our algorithm.We implement our algorithm in JavaScript and implement the parameter server architecture by modifying the source code of the MLitB framework.And we also implement two kinds of distributed stochastic gradient descent algorithms in our paper.Finally,we tested our algorithm and get conclusions. |