Use RDMA To Accelerate The Distributed Deep Learning

Posted on:2020-06-15

Degree:Master

Type:Thesis

Country:China

Candidate:C Liu

Full Text:PDF

GTID:2428330623463629

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Deeper models and larger datasets are two major ingredients for applying deep learning(DL)on real-world problems,which inevitably shifts model training from on a single GPU card to on a GPU clusters due to limited GPU memory and time-tosolution requirements.High-speed low-latency RDMA-capable network fabrics like InfiniBand and RoCE play an important role on coping with enoumous data exchanged during training.DL frameworks are built upon these fabrics with various APIs including IPoIB,MPI and RDMA Verbs.Tradeoffs are made between performance and usability when adapting DL frameworks onto RDMA-capable networks,which may result in highperformance yet hard-to-maintain and hard-to-merge code if improper design choices are made.This paper presents our approach to adapt MXNet,a modular versatile DL framework onto RDMA-capable networks.Dividing the training process on MXNet into P2 P communication and AllReduce commnunication,we add incremental optimizations on its message passing code.Experiments show that our approach exhibits near-linear speedups,whose parallel efficiency reaches 96% compared to 53% of the original IPoIB version when scaling to 100 GPU cards.In contrast to other MPI-based porting approach,our modifications are limited within MXNet's Parameter Server module,which is transparent for upper-layer operations,thus making no sacrifice on features like auto recovery and flexible consistency.

Keywords/Search Tags:

RDMA, Deep Learning, Network

PDF Full Text Request

Related items

1	Research And Implementation Of Distributed Machine Learning Acceleration Component Based On RDMA Batch Operation
2	A General RDMA Network Platfrom For Data Centers
3	Fc-ae-rdma Technology And Achieve
4	Optimization And Implementation Of Data Transmission Mechanism Based On RDMA
5	Runtime Optimization For Large-Scale Neural-Network Data-Parallelism Training
6	Research On Distributed Caching Technology Of Network Traffic Based On RDMA
7	Optimization Design For Deep Belief Network And Its Applications
8	Research On Construction Of RDMA End-host System For Multi-application Datacenters
9	Research On Index Technology Of Database Based On Non-Volatile Memory And RDMA High Speed Network
10	Key-Value Store Performance Optimization Based On RDMA