Font Size: a A A

Algorithm Convergence Sensitivity Optimizing Technology For Distributed Machine Learning

Posted on:2021-04-07Degree:MasterType:Thesis
Country:ChinaCandidate:Y C FanFull Text:PDF
GTID:2428330605982486Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of science and technology,human beings have come to the era of big data.In order to alleviate the weakness of single machine learning training on huge training data,distributed machine learning emerges as the times require.Distributed machine learning has a good effect in dealing with complex data,improving the accuracy of model,expanding the application field and so on.At present,the frequently-used distributed machine learning framework can be divided into data parallel,model parallel and hybrid parallel.In the field of data parallel,parameter server is a hot topic.Parameter server is responsible for global model aggregation,updating global model and communication tasks,while computing node is responsible for machine learning model training.The strategy and the training algorithm are the important links which mainly affect the accuracy of the global model.In distributed machine learning,the convergence sensitivity of the model directly affects the convergence speed and accuracy of the model training.In work node,the stochastic optimization algorithm represented by the Stochastic Gradient Descent method has the phenomenon of large convergence vibration in the early training period,which reduces the convergence sensitivity of the model in the process of convergence and makes the convergence speed of the model training can be improved.With the characteristics of more sub models in the distributed computing environment,other sub models can be used to reduce the convergence vibration range in the process of convergence,so as to improve the convergence efficiency.In the parameter server,the model aggregation strategy will indirectly affect the convergence sensitivity of the algorithm.In the process of model aggregation,the gradient contribution of sub models in the global model should be different,so that the global model can absorb the data characteristics of different sub models as much as possible.In the field of data parallel,the common aggregation strategy of model average method is to calculate the global model in the way of sub model full average.The aggregation process of model average method ignores the difference of contribution of sub model to the global model gradient and reduces the convergence sensitivity,which leads to the reduction of convergence efficiency and training accuracy.Aiming at the above problems,this paper studies the training algorithm and model aggregation strategy of distributed machine learning.For training algorithm and model aggregation,Parallel Stochastic Gradient Reordering Descent and Ordered Weighted Average Model Aggregation are proposed respectively.According to these two innovations,the distributed machine learning framework paradise is implemented.The main contents of this paper are as follows:(1)In this paper,the influence of gradient vibration on convergence sensitivity in the convergence process of Stochastic Gradient Descent method is analyzed,and the range of vibration is defined as an index to quantify the degree of influence.Using other sub models with different gradients in the distributed computing environment to adjust the descent direction of the sub models in this node can make the descent direction of the global model closer to the global optimal direction and improve the convergence sensitivity of the model.Based on this idea,this paper proposes a Parallel Stochastic Gradient Reordering Descent,which can effectively improve the convergence sensitivity of the model,improve the convergence speed of the model,and prove its convergence through theory.(2)In this paper,the Model Average aggregation strategy commonly used in the frequently-used distributed machine learning framework is analyzed.Aiming at the problem of the same gradient contribution of the sub model of the strategy,the Model Average is improved.A gradient importance evaluation model is proposed to make use of the different gradient importance of the sub model to differentiate the gradient contribution degree of the global model.The Ordered Weighted Average Model Aggregation is proposed based on the gradient important evaluation model.The output index of the gradient important evaluation model is used as the weight to aggregate the weighted model,which can realize the differentiation of the gradient contribution of the sub model in the global model,improve the convergence sensitivity of the model,thus improving the convergence efficiency and training accuracy.(3)Based on the Parallel Stochastic Gradient Reordering Descent and Ordered Weighted Average Model Aggregation,the parallel oriented distributed machine learning framework paradise is implemented.The experimental results show that the proposed training algorithm and model aggregation strategy can effectively improve the convergence sensitivity of the model.
Keywords/Search Tags:Distributed Machine Learning, Model Aggregation, Training Algorithm, Weighted Model Aggregation, Parallel Stochastic Gradient Descent Reordering
PDF Full Text Request
Related items