Font Size: a A A

Research On The Bulk Cloud Data Placement And Transfer Scheduling Among Datacenters

Posted on:2015-02-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y W WangFull Text:PDF
GTID:1228330467464312Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid growth of cloud services around the world, the amount of data that is stored in the cloud platform increases from terabyte scale to petabyte scale. To provide a highly available and reliable service, the cloud data items are usually replicated and placed in geographically distributed dat-acenters. To ensure the consistency of data replica, data migration and data backup are periodically executed among datacenters. However, suffering from the large scale amount of data, the bulk data transfers impose a heavy load to the links between datacenters, and dominate the inter-datacenter aggregate traf-fic. This forces the cloud providers to lease more network resources from the Internet Service Providers (ISP) to accommodate these soaring bulk traffic. As consequences, the network transfer cost has largely increased:To solve these problems, this thesis carries on the thorough research and finds that the network topology, link utilization and link pricing scheme play critical roles in placing and transferring data items. Unfortunately, when man-aging these data items, most cloud providers mainly consider how to design a framework that can handle these big data, such as MapReduce, but fail to com-bine the network status to obtain an optimized strategy. Therefore, this thesis tries to propose a network-aware framework for the bulk data placement and transfer scheduling. The attributions are generalized as follows:●We propose the bulk dependent data placement algorithm in geo-distributed networks. The algorithm try to separate the data items into different groups according to its dependency, such that the highly depen-dent data items stay in the same group. Combining the characteristics of network topology, the highly dependent data items are always placed in the same datacenter, or datacenters that are connected with high-capacity links. Finally, the consumption of network resources are reduced as much as possible. The simulation results show that our strategy can significantly decrease the consumption of network resources.●We propose the multiple bulk data schedule algorithm in inter-datacenter networks. Combining the delay tolerance feature of inter-datacenter bulk transfers and the temporal and spatial characteristics of bulk data traffic, we try to investigate the multiple bulk data transfers scheduling problem. To improve the utilization of network resources, we aim at completing the transfer by utilizing the spare network resources. Temporally, we apply the store-and-forward transfer mode to reduce the peak traffic load on the link. Spatially, we propose to lexicographically minimize the congestion of all links among datacenters. The simulation results prove that our algo-rithm can significantly reduce the network congestion as well as balance the entire network traffic●We propose the joint placement and routing algorithm for reducing the bulk data transfer cost in inter-datacenter networks. The algorithm formu-lates the problem via adding virtual nodes and links, such that the place-ment and routing problem can be joint optimized. Combining the shipping and internet pricing scheme, the algorithm tries to minimize the transfer cost while ensuring the constraint of link capacities. The simulation re-sults show that, compared with routing or placement only optimizations, our algorithm can significantly reduce the network transfer cost.
Keywords/Search Tags:cloud data, inter-datacenter, high utility data placement, high utility data transfer, low cost data transfer
PDF Full Text Request
Related items