Font Size: a A A

Resource Scheduling in Geo-Distributed Computin

Posted on:2018-03-16Degree:Ph.DType:Thesis
University:University of Southern CaliforniaCandidate:Hung, Chien-ChunFull Text:PDF
GTID:2448390002999192Subject:Computer Science
Abstract/Summary:
Due to the growing needs in computing and the increasing volume of data, cloud service providers deploy multiple datacenters around the world in order to provide fast computing response. Many applications utilizing such geo-distributed deployment include web search, user behavior analysis, machine learning applications, and live camera feeds processing. Depending on the characteristics of the applications, their data may be generated, stored, and processed across the geo-distributed sites. Hence, efficient processing of the data across the geo-distributed sites is critical to the applications' performance. Existing solutions first aggregate all the required data at one location and execute the computation within the site. Such solutions incur large amounts of data transfer across the WAN and lead to prolonged response times for the applications due to significant network delays. An emerging trend is to instead distribute the computation across the sites based on data distribution, and aggregate only the results afterwards. Recent works have shown that such an approach can result in significant improvement in response time as well as reduction in WAN bandwidth usage. However, the performance of the geo-distributed jobs highly depends on how the resources are scheduled, which raises new challenges as the trivial extensions of state-of-the-art scheduling solutions lead to sub-optimal performance. In this thesis, we first improve the performance of geo-distributed jobs from the perspective of computation resources. We provide the insights into how conventional Shortest Remaining Processing Time (SRPT) falls short due to the lack of scheduling coordination among the sites, and propose a light-weight heuristic that significantly improves the jobs' response time. We also design a new job scheduling heuristic that coordinates the workload demands and the resource availability among the sites and greedily schedules job that can finish quickly. The trace-driven simulation studies show that our proposed scheduling heuristics effectively reduces the response time of the geo-distributed jobs by up to 50%. Next, we address the geo-distributed jobs' performance from the perspectives of both the computation and the network resources. Specifically, we address the scheduling challenge of the heterogeneity of the resources availability across the sites and the mismatch of the data distribution across the geo-distributed sites. We formulate the task placement decisions using a Linear Programming optimization model, and allocate the resources greedily to the job that can finish quickly. In addition to the response time, our design can also easily incorporate other performance goals, e.g., fairness and WAN usage, with simple control knobs. The EC2-based deployment of our prototype and the large-scale trace-driven simulations showed that our solutions can improve the response time of a baseline in-place scheduling approach by up to 77%, and improve the state-of-the-art geo-distributed analytics solution by up to 55%. Finally, we expand to a more general setting in which each job has multiple configuration options, and its quality depends on the configuration it utilizes. We motivate this problem by the scenario of processing live camera feeds across hierarchical clusters. In this setting we focus on the scheduling problem of jointly determining job configuration and placement for concurrent jobs and design an efficient heuristic to maximize the overall quality with available resources across the geo-distributed sites. Our evaluation based on an Azure deployment of our prototype showed that the proposed solution outperforms the state-of-the-art video analytics scheduler by up to 2.3X and the widely deployed Fair Scheduler by up to 15.7X, in terms of the average quality of the concurrent jobs.
Keywords/Search Tags:Geo-distributed, Scheduling, Data, Response time, Job
Related items