Font Size: a A A

Effective Straggler Mitigation With Cross-layer Interference-aware Optimization

Posted on:2020-10-08Degree:MasterType:Thesis
Country:ChinaCandidate:Y M LiFull Text:PDF
GTID:2518306518963449Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In-memory data processing frameworks using Map Reduce programming model(e.g.,Spark)make big data analysis greatly simpler and efficient.However,stragglers that take much longer to finish than other tasks significantly degrade performance.There exist multiple factors that cause stragglers,either from the hardware resource layer or application layer,e.g.hardware heterogeneity,interference,data locality and data skew.While state-of-the-art straggler mitigation techniques have presented partial solutions on data skew and data locality,we experimentally demonstrate that the other factors can also result in serious problems,and the combined effect of them makes the situation worse.We present Clio,a cross-layer interference-aware optimization system that can effectively mitigate stragglers for data processing frameworks.Clio supports the scheduling of both map and reduce tasks.It heuristically dispatches intermediate data in proportion to the actual computing ability of each worker node,which is estimated according to task performance model considering various straggler factors,to balance the completion times of tasks in a much finer way.We implement Clio in Apache Spark,and evaluate its performance in a variety of data applications using both synthetic and real datasets.Experiment results show that,Clio can speed up the execution of applications by up to 67%,compared with the existing algorithms.Therefore,Clio can be used in practice to speed up data processing tasks in various scenarios more efficiently,reduce straggler problems and improve resource utilization.
Keywords/Search Tags:Straggler mitigation, Spark, scheduling, key partitioning
PDF Full Text Request
Related items