Optimization And Implementation Of Data Transmission Strategy Based On RDMA

Posted on:2019-02-18

Degree:Master

Type:Thesis

Country:China

Candidate:B Liu

Full Text:PDF

GTID:2428330611493263

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

In the era of big data,the growth of massive data,big data processing framework and deep learning platform are evolving,in order to obtain more efficient and useful information from massive data.Apache Spark is a lightning-fast unified analytics engine for large-scale data processing.However,shuffling data across the stages in a cluster is timeconsuming because it will place significant burden on operating system on both the source and the destination by requiring many remote files and network I/Os.Thus,it could not fully take advantage of the performance benefits provided by high-speed interconnects.This has become the major performance bottleneck for Apache Spark.For deep learning platform technology,researchers quickly recognize that deep learning has very similar characteristics to large-scale HPC applications.Thus,beginning from 2016,the established MPI interface became the de-facto portable communication standard in distributed deep learning.On the other hand,with the price for the hardware has dropped dramatically,most data centers today are equipped with modern high-performance interconnects such as InfiniBand.We can use it design more efficient data transmission strategies to achieve high bandwidth and low latency.In order to accelerate the data transmission of Spark Shuffle and improve the training efficiency of deep learning,this paper studies and optimizes the data transmission strategies based on RDMA features.The main contributions of this dissertation are as follows:(1)Accelerating Spark Shuffle with RDMA.We present a hybrid approach for Apache Spark that incorporates communication over conventional sockets as well as RDMA over InfiniBand.Our analysis focus on the architecture and data flow of the shuffle phase.Then,we propose a new RDMA-based design which provides tiering memory pool and uses SEND/RECV and READ to transfer small and large messages.Our accelerations to the data transfer layer through this native RDMA-based data shuffle can benefit Spark workloads transparently.(2)Research and optimization of high performance collective communications library based on RDMA.Mobius is a high-performance distributed collective communications library for deep learning.Its logical architecture is divided into four layers: transport layer,algorithm layer,policy layer and input layer.The transport layer encapsulates TCP Sockets and RDMA communication.The algorithm layer called it to transfer data.And it dynamically select the communication protocol and communication algorithm at the policy layer.The performance of the Mobius collective communications library is better than gloo library,but there is still a certain gap from NCCL2.

Keywords/Search Tags:

RDMA, InfiniBand, Spark Shuffle, Collective Communications library

PDF Full Text Request

Related items

1	Optimization And Implementation Of Data Transmission Mechanism Based On RDMA
2	Shuffle Performance Optimization Of Spark Based On RDMA Technology
3	Optimal Design And Implementation Of RDMA-based On Big Data System
4	Research On Key Technologies And Application On YARN For High-Performance Computing
5	Research On Shuffle Mechanism In Spark Cluster
6	Research On Spark Shuffle Process Performance Optimization
7	The Design And Implementation Of HDFS Based On Infiniband
8	Design And Implementation Of Distributed Key-Value Storage System Based On RDMA
9	Performance Optimization Methods For Shuffle Process Of Spark Platform
10	Analysis And Optimization Of Memory Scheduling Algorithm Of Spark Shuffle