Font Size: a A A

Optimal Design And Implementation Of RDMA-based On Big Data System

Posted on:2016-08-08Degree:MasterType:Thesis
Country:ChinaCandidate:L L LiangFull Text:PDF
GTID:2348330536967367Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the advent of the era of big data,and the amount of data growing on every day,the computer system has undergone a significant change,many applications need to scale up to large clusters,so as to meet the urgent needs of large data computation and storage.However,the diversity of application lead system faces very low performance,such as streaming computing,Iterative and interactive algorithm.This caused a lot of development for these different applications customized computing framework.Spark is an open source cluster computing system memory-based computing,which provides a richer than Hadoop's MapReduce model,you can quickly set in memory of the data multiple times iteration.Thus,Spark system is very suitable for iterative machine learning,and interactive data analysis.Tachyon as distributed memory file system,which not only reduces the pressure Spark system memory,and enhance the ability of Spark system memory read and write large amounts of data.Today,large data systems mainstream network data transmission techniques are based on Ethernet socket programming,and its top is the TCP/IP protocol,the traditional way in all of the data to the user buffer and kernel buffers multiple copies of data,requires a great deal of memory bandwidth and a large number of read and write operations,thereby reducing CPU utilization,memory bandwidth consumption,greatly increasing the end-latency network bandwidth largely affected the transmission performance of the system.Network Communications will become a major data systems a critical performance bottlenecks and major problems to be solved,especially in the face of network I/O-intensive applications.By fully exploit RDMA(Remote Direct Memory Access)network technologies and operating system latency bypass feature,utilizing the advantages of InfiniBand high-performance interconnect systems and rebuild Spark Tachyon system network data communication module,thereby enhancing the large data systems operational efficiency and reduce the CPU utilization.Firstly,the communication mechanism for NIO Spark system network data transmission Shuffle involved in the design and implementation of technology-based RDMA network data transmission module,while RDMA connection delay when a new message for the larger pool of data transfer rate and other issues to optimize the work,designed to automatically establish a connection with the adaptive data messages pool,thereby improving the utilization efficiency of high performance networks.Finally,the experiment proves Spark RDMA-based system,not only makes the overall system efficiency has been significantly improved,and at the same time reducing the average system memory usage and CPU utilization.Secondly,for the network data transmission module Tachyon system remote read operation,design and implementation of a data transmission network based on RDMA technology.While on the one hand designed to request a zero-byte message,reduce network traffic overhead;on the other hand the data into a data stream block transfer mode,reducing duplicate data requests.Ultimately through laboratory tests show the performance of RDMA-based remote reads of Tachyon the system significantly improved,while reducing CPU utilization.
Keywords/Search Tags:RDMA, InfiniBand, Socket, Spark, Tachyon
PDF Full Text Request
Related items