On Network Optimization Technology For MXNet-based Large-scale Distributed Machine Learning

Posted on:2021-04-04

Degree:Master

Type:Thesis

Country:China

Candidate:Y Sun

Full Text:PDF

GTID:2428330614967668

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

Big data and large models have laid a solid material foundation for the rapid development of artificial intelligence,and also raised new technical challenges.Single-machine training has been far from meeting its needs for computing power and storage resources.Distributed machine learning uses multiple nodes in a computer cluster to train simultaneously and cooperate with each other to speed up the learning process.Communication between nodes often becomes a bottleneck in distributed machine learning systems.In order to achieve more efficient communication,this paper proposes network optimization technical solutions from the perspective of communication pace and communication topology respectively.The main contents are as follows:First,this thesis designs a hybrid parallel distributed machine learning algorithm based on delay processing.Distributed machine learning algorithms can be divided into synchronous algorithms and asynchronous algorithms according to the pace of communication.Both have advantages and disadvantages.In order to better achieve the balance between training speed and convergence accuracy,this paper proposes a hybrid parallel communication structure that combines synchronous and asynchronous parallel.The framework also optimizes the asynchronous part based on delay processing.The new algorithm can not only obtain the training acceleration ratio close to the asynchronous algorithm but also exceed the synchronous algorithm in the convergence accuracy,which provides a new idea for the design of the communication mechanism of distributed machine learning.Secondly,this thesis designs a topology-aware All Reduce algorithm.The biggest disadvantage of distributed machine learning systems based on parameter servers is that server nodes will become communication bottlenecks.In order to circumvent this problem,this paper applies the All Reduce algorithm in the field of high-performance computing to distributed machine learning.The Recursive Halving and Doubling All Reduce algorithm with a lower time complexity is adopted by this paper instead of Ring All Reduce algorithm.On this basis,a topology awareness module is designed to make full use of the high-bandwidth connections between nodes while weakening the impact of low-bandwidth connections,effectively increasing the training speed.

Keywords/Search Tags:

distributed machine learning, synchronous parallel, asynchronous parallel, gradient dealy, All Reduce

PDF Full Text Request

Related items

1	Research On Data Parallel Communication Strategy For Distributed Machine Learning System
2	Research On Improving The Convergence Performance Of Stochastic Gradient Descent In Distributed Machine Learning
3	Communication Optimization Technology For Distributed Machine Learning Framework
4	Communication Dynamic Optimizing Technology For Distributed Machine Learning
5	The Research Of Distributed Parallel Support Vector Regression Machine Algorithm And Framework
6	Distributed Graph-parallel Framework Scheduling Analysis And Optimization
7	Research On Machine Learning Remote Parallel Training Algorithm
8	Research On Key Technologies For Scalable Machine Learning
9	Algorithm Convergence Sensitivity Optimizing Technology For Distributed Machine Learning
10	Studies On Parallel Algorithm Based On Multinomial Preprocessing Conjugate Gradient Method