Font Size: a A A

Research On RDMA Network Resource Multiplexing And Application Acceleration For Data Centers

Posted on:2020-08-24Degree:MasterType:Thesis
Country:ChinaCandidate:H N QiuFull Text:PDF
GTID:2428330575952562Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Remote Direct Memory Access(RDMA)allows applications access the remote memory bypassing the remote CPU,offloads the protocol stack to the NIC to provide utral low latency and reduce the CPU costs.However,on the one hand,due to that RDMA stores the network connections in the NIC's space-limited cache to acceler-ate network IO,network IO performance degration will be caused by frequent cache miss under large number of concurrent connections.The existed solution applies mu-tex for sharing low-level network connections among multiple local threads,reducing the number of connections to the same remote node,but it has non-negligible overhead caused by lock contention and can not provide fair service for multiple threads.On the other hand,network communication is always the performance bottleneck of distributed machine learning.Current methods for accelerating distributed applications only take the transport service type and primitives selection under small data transmission oc-cassion into consideration,ignoring the semantic mismatch of network API,and more factors influencing the RDMA performance including network connection parameters,network buffers management,PCIe efficiency and etc.As a result,these methods can not provide systematic and effective acceleration for distributed machine applications.Our first work focuses on the problem of performance bottleneck and unable to share the receive queue among multiple applications in current mutex-based resource multiplexing method,and combines the demand of fair service for applications under resource sharing.We share the low-level network resources in the system layer,de-sign the abstract connection and process the request in an asynchronous way to provide network service for mutiple applications.We also point out that head-of-line blocking and coarse-grained scheduling problem exist at current NICs by experiments and pro-vide fair service among multiple applications and traffics with different priorities by adding traffic split and fair queuing algorithm at the resource sharing layer.Our re-source sharing method can remove the bottleneck caused by lock contention,share the low-level network resource for all applications on the same machine,effectively reduce the amount of used resource and provide fair service for connections.Our second work focuses on the communication bottleneck of distributed machine learning framework,MXNet,design the RDMA network API compatiable with socket semantic,optimize the queue pipeline depth parameter,desgin an application-agnostic network buffer man-agement module,use memory copy in pipeline style and on-demand paging to adap-tively optimize the latency of small data transmission and CPU overhead of big data transmission,to provide a systematic acceleration solution for the distributed machine learning framework MXNet.We implement the system and test it in the real environment to prove the effect-ness of our methods.Through experiments and anlaysis,our resource sharing method can remove the overhead of lock contention,keep the primitive performance of RDMA while multiplexing the single QP for 1024 abstract connections,and guarantee fair ser-vice.Our new RDMA API can provide 5x to 9x performance for the parameter server of MXNet and 2x performance for distributed MNIST algorithm.
Keywords/Search Tags:Remote Direct Memory Access, Resource Multiplexing, Application Acceleration, Data Center Network
PDF Full Text Request
Related items