Font Size: a A A

Efficiency Optimization Mechanism For Geo-distributed Data Analytics

Posted on:2021-10-28Degree:MasterType:Thesis
Country:ChinaCandidate:B GaoFull Text:PDF
GTID:2518306107460714Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Many organizations ranging from small start-ups to industry giants are running services on top of multiple data centers and/or edge clusters across the globe to provide low latency access to users.The services deployed on these geo-distributed sites continuously produce massive amounts of data.As the results from analyzing these data are of high value for making decisions,conducting analysis in a timely,low-cost and resource efficient manner becomes crucial.The key challenge of geo-distributed data analytics(GDA)is resource heterogeneity.Existing literature on GDA mainly concentrates on scheduling tasks and jobs to make the most efficient use of the given heterogeneous resources.These solutions assume that each data center is indistinguishable in the computation resources and has infinite capacities to run unlimited number of tasks.However,for small-sized enterprises and organizations,they typically rent revocable server instances from the public cloud provider,and the rental cost dominates the operational expenditure.They should carefully make plans to rent computation instances and schedule tasks on them to achieve global benefit.This research exploits the resource heterogeneity to jointly optimize server provisioning and task scheduling for geo-distributed analytics.This thesis formulates such a problem as a mixed integer linear programming,which is proved NP-hard and stochastic.To address this challenge,this research first transforms the problem into a linear programming with auxiliary parameters,and then proposes a graph-based rounding theorem to construct a feasible solution.Through both rigorous theoretical analysis and extensive trace-driven evaluations,the effectiveness of the proposed solution is validated.Evaluations using a production trace from Wikipedia show that,compared with baselines,both the completion time and cost achieve a significant reduction.
Keywords/Search Tags:Geo-distributed Analytics, Server Provisioning, Task Scheduling, Resource Heterogeneity, Public Cloud
PDF Full Text Request
Related items