Font Size: a A A

On Network Optimization Technology For MXNet-based Large-scale Distributed Machine Learning

Posted on:2021-04-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y SunFull Text:PDF
GTID:2428330614967668Subject:Engineering
Abstract/Summary:PDF Full Text Request
Big data and large models have laid a solid material foundation for the rapid development of artificial intelligence,and also raised new technical challenges.Single-machine training has been far from meeting its needs for computing power and storage resources.Distributed machine learning uses multiple nodes in a computer cluster to train simultaneously and cooperate with each other to speed up the learning process.Communication between nodes often becomes a bottleneck in distributed machine learning systems.In order to achieve more efficient communication,this paper proposes network optimization technical solutions from the perspective of communication pace and communication topology respectively.The main contents are as follows:First,this thesis designs a hybrid parallel distributed machine learning algorithm based on delay processing.Distributed machine learning algorithms can be divided into synchronous algorithms and asynchronous algorithms according to the pace of communication.Both have advantages and disadvantages.In order to better achieve the balance between training speed and convergence accuracy,this paper proposes a hybrid parallel communication structure that combines synchronous and asynchronous parallel.The framework also optimizes the asynchronous part based on delay processing.The new algorithm can not only obtain the training acceleration ratio close to the asynchronous algorithm but also exceed the synchronous algorithm in the convergence accuracy,which provides a new idea for the design of the communication mechanism of distributed machine learning.Secondly,this thesis designs a topology-aware All Reduce algorithm.The biggest disadvantage of distributed machine learning systems based on parameter servers is that server nodes will become communication bottlenecks.In order to circumvent this problem,this paper applies the All Reduce algorithm in the field of high-performance computing to distributed machine learning.The Recursive Halving and Doubling All Reduce algorithm with a lower time complexity is adopted by this paper instead of Ring All Reduce algorithm.On this basis,a topology awareness module is designed to make full use of the high-bandwidth connections between nodes while weakening the impact of low-bandwidth connections,effectively increasing the training speed.
Keywords/Search Tags:distributed machine learning, synchronous parallel, asynchronous parallel, gradient dealy, All Reduce
PDF Full Text Request
Related items