Font Size: a A A

Geo-distributed Job Scheduling Techniques Via Multi-edge Collaboration

Posted on:2021-01-13Degree:MasterType:Thesis
Country:ChinaCandidate:T T WangFull Text:PDF
GTID:2428330647950902Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Built on edge computing,geo-distributed big data analytics achieve wider computing scale,richer resource pool,and more flexible serving mode.However,low latency–the critical goal pursued in the field of wide area analytics optimization–is challenged by some factors in the multi-edge computing environment.For example,the contradiction between the bandwidth constraint of an edge network and the raw data upload requirements it carries,the contradiction between the unstable performance of edge resources and the low latency requirements of jobs,etc.The paper focuses on the above issues and devotes to the corresponding solutions based on multi-edge collaboration,fairly ensuring the job completion times in the multi-edge systems.Considering the limited storage,edges are unable to cache a large amount of raw data in advance like the data center.Instead,the raw data is not allowed to be uploaded until the computing jobs is released.Meanwhile,each job may have data uploading and processing requirements in multiple geographically dispersed edges.Due to bandwidth constraints and computing limitations,the jobs may compete resources in these edges.Chaotic competition between multiple jobs may result in performance degradation overall.The paper proposed Smart Dis,a geo-distributed job scheduling algorithm aiming to coordinate the order of data uploading and processing for jobs' concurrent geo-distributed tasks,to optimize the completion time at the job level.The optimization opportunity is the skewness among the completion time of a job's tasks.Such skewness always exists because there is always a difference among tasks in the data volumes and edge capabilities.Therefore,it is possible to postpone the faster tasks in some edges without hurting the job performance,as the job completion time depends on the lasttask completion.Theoretically,Smart Dis can guarantee that the overall data upload time is no more than 3 times the optimal value.Simulations shows that Smart Dis is better than any of the previous geo-distributed job scheduling algorithms,achieving at least 25% performance improvement.Based on the schedule of Smart Dis,the paper further finds that the instability of edge resource performance is likely to cause parallel-processing jobs to fall into the “slow” task problems,resulting in long tail in job completion times.Previous researches have shown that task replication can tame the delay of slow tasks and speed up jobs.Meanwhile,for resource-limited edge clusters,one may desire to replicate tasks not just locally,but also remotely in other idle edges.However,existing replication algorithms are designed for the single cluster and not applicable to the high dynamic,high heterogeneous multi-edge computing environment.In this paper,random variables are introduced to characterize the instability of resource performance in heterogeneous edges,and an online geo-distributed task replication algorithm Geo Clone is designed.Geo Clone coordinates requirements of task replicas among multiple jobs,adapting to the availability of resources in edges and the overall load of the system.The paper rigorously proves the efficient competitive ratio of Geo Clone.Finally,the paper implements prototype for experiments and also conduct large-scale simulations to evaluate the Geo Clone comprehensively.
Keywords/Search Tags:Edge Computing, Geo-distributed Data Analytics, Parallel Scheduling, Multi-replica Execution
PDF Full Text Request
Related items