Font Size: a A A

Joint Scheduling Of Data And Computation In Geo-distributed Cloud Systems

Posted on:2015-08-23Degree:MasterType:Thesis
Country:ChinaCandidate:L Y YinFull Text:PDF
GTID:2348330485494390Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Recent trends show that cloud computing is growing to span more and more globally distributed data centers. For geo-distributed data centers, how to place tasks to appropriate data centers is one of the most difficult problems. At the same time, due to the characteristics of scientific computing, the scheduling algorithms require to jointly consider the task's input data and computation. Hence, this scheduling must deal with situations such as wide-area distributed data, data sharing among tasks, WAN bandwidth costs and data center capacity limits, while also minimizing completion time.However, this kind of scheduling problems is known to be NP-Hard.In this paper, inspired by real applications in astronomy field, we propose a twophase scheduling algorithm that addresses these challenges above. In the mapping phase, tasks and their input data are modeled in the hypergraph, then these tasks are partitioned into different groups considering the data-sharing relations among tasks,and these groups are dispatched to the data centers with the maximum data locality by way of one-to-one correspondence, through which the volume of data transfers is minimized. The reassigning phase balances the completion time across data centers according to data-sharing relations between tasks and groups, in which several tasks are reassigned concerning minimizing the increasing data transfers. The goal of the reassigning phase is to minimize the overall completion time.We utilize the real China-Astronomy-Cloud model and typical applications to evaluate our proposal. Simulations show that our algorithm obtains up to 23% better completion time. Furthermore, experiments show that the proposed algorithm reduces the amount of data transfers more effectively compared with other similar scheduling algorithms under different conditions such as different load ratios and different communication-to-computation ratios.
Keywords/Search Tags:geo-distributed data centers, cloud computing, scheduling algorithm, data-sharing, compute-and data-intensive
PDF Full Text Request
Related items