Font Size: a A A

A Scalable And Efficient Reliable Group Data Delivery Service For Data Centers

Posted on:2014-02-10Degree:DoctorType:Dissertation
Country:ChinaCandidate:J X CaoFull Text:PDF
GTID:1228330398964270Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Reliable Group Data Delivery (RGDD) is a pervasive traffic pattern in data centers. In an RGDD group, a sender needs to reliably deliver a copy of data to all the receivers. RGDD has been widely used in modern data centers, e.g, GFS, Amazon EC2and Win-dows AZure. RGDD plays an important role on the performance of these systems in data centers.In data centers, there are a large number of RGDD groups, and each group contains a large number of receivers. These RGDD groups contributes a lot of traffics. Therefore, RGDD should be both scalable and efficient.Existing solutions are not suitable for RGDD in data centers. They either do not scale due to the large number of RGDD groups (e.g., reliable IP multicast) or cannot efficiently use network bandwidth (e.g., endhost based overlay systems). On the other hand, they do not leverage the data center network topology information, so they cannot fully utilize the network bandwidth.Recently, there are two clear technical trends in data center networks.1) Today’s data center fabrics (e.g., BCube, CamCube) are providing multiple edge-disjoint Steiner trees for RGDD.2) Network devices are able to do in-network packet caching, since they are integrating CPUs and memories. These technical trends provide new opportunities for RGDD designs in data center networks.In this thesis, by exploring the design spaces provided by these new technical trend-s, we propose Datacast for RGDD in data centers. Datacast is a centralized designed system targeting at providing a scalable and efficient RGDD service for data centers. Datacast contains the following work:1. To address the problem of the existing solutions’ inability to calculate enough edge-disjoint Steiner trees in data centers in a short time, we design a multiple edge-disjoint Steiner trees algorithm for data center networks. This algorithm first creates multiple edge-disjoint spanning trees for data center networks based on the topology information, then prunes these spanning trees, and finally repairs the broken Steiner trees by using BFS (Breadth First Search). This algorithm has very low time complexity. It can create enough edge-disjoint Steiner trees even if there are network failures.2. To address the problem of efficient data delivery in each Steiner tree, we de-sign a single rate multicast congestion control algorithm. By leveraging the in-network packet caching ability, this algorithm first proposes using duplicate interests as congestion signals and AIMD (Additive Increase and Multiplicative Decrease) for multicast congestion control. Datacast congestion control algo-rithm effectively synchronizes receivers, and helps Datacast achieve scalability and high bandwidth efficiency. Datacast congestion control algorithm effectively solves the classical single rate multicast congestion control problem.3. We build a fluid model for Datacast congestion control algorithm. The model captures the essence of Datacast congestion control algorithm, AIMD. By ana-lyzing the model, we derive two theorems describing the cache requirement and the bandwidth efficiency respectively. We find that Datacast is able to.work at full rate when the cache size is greater than a small threshold (e.g.,125KB), and causes few duplicate data transmissions (e.g.,1.19%). This model proves that Datacast congestion control algorithm helps Datacast achieve scalability and high bandwidth efficiency theoretically.4. To address the problem of data delivery when only partial network devices support data packet caching, we propose a Datacast incremental deployment strategy. In Datacast incremental deployment strategy, we first design a Steiner tree maxi-mum transmission rate increase algorithm with linear time complexity. By using the reverse links of the original Steiner tree to let receiver retrieve data from peer nodes, this algorithm greatly increases the maximum transmission rate of each Steiner tree. Then, we propose the method of adding auxiliary caching nodes to prevent false congestion signal.Our Datacast incremental deployment strategy enables Datacast to work at a high transmission rate when only partial network devices support data packet caching.5. We implement Datacast in NS3and also build it with the ServerSwitch plat-form. We have evaluated it by both simulations and experiments. The results confirm our theoretical analysis, and also suggest that Datacast achieves scalabil-ity and high bandwidth efficiency.
Keywords/Search Tags:Multicast, P2P, data center, data distribution
PDF Full Text Request
Related items