Research And Implementation Of Highresponsive Hadoop Computing Resource Scheduler Based On YARN

Posted on:2017-04-28

Degree:Master

Type:Thesis

Country:China

Candidate:Y Liu

Full Text:PDF

GTID:2308330509457493

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

As Big Data processing technologies are applied to more and more fields, the responsiveness of Big Data processing platform has become a focus. As the most popular open-source Big Data processing platform, Hadoop 2.0 adopts YARN to manage computing resources, which improves the resource utilization and makes mutiple programming frameworks available on Hadoop. But the current resource scheduling policies in YARN are designed for batch processing systems, making the responsiveness of Hadoop YARN platform unadequate. Therefore, in this dissertation, we study the high-responsive computing resource scheduler on Hadoop YARN.This dissertation presents the FSPY(Fair Sojourn Protocol in YARN) scheduler. We inherite the idea of size-based scheduling and make several improvements and extentions according to the resource management pattern and application properties of Hadoop YARN. FSPY gives the highest priority to the job that would have smallest virtual size. Here, job size is the definite integral of job allocated resources in its life cycle, and job virtual size refers to job’s remaining size(under a virutal Fair scheduler) at some time. To track jobs’ virtual size, we design a job size prediction module and a job virtual size calculation module. In the job size prediction module, we present a new mechanism for job size prediction with probing job and regression analysis. In the job virtual size calculation module, we propose a series of simple but effective algorithms. Furthermore, to alleviate the effect of job size prediction latency, we schedule jobs by the fair policy until their sizes are predicted.The performances of the Map Reduce job size prediction module and the whole FSPY scheduler are assessed through a series of realistic experiments. The job sets and data sets used in our experiments are synthetized by SWIM based on Fack Book 2009 datacenter trace. Experimental results show that our prediction approach reaches an R2 prediction accuracy of 0.97, which is sufficient to most size-based scheduling algorithms; when the cluster runs under a light workload, FSPY scheduler has a similar responsiveness with the Fair scheduler; when the cluster runs under a heavy workload, FSPY scheduler significantly outperforms the Fair scheduler in terms of responsiveness; FSPY scheduler also guarantees the fairness among jobs as the job slowdown is restricted to a narrow scope. In general, FSPY scheduler effectively improves the responsiveness of Hadoop YARN platform, and guarantees the fairness in terms of job response time, which makes this scheduler is valuable in practice.

Keywords/Search Tags:

computing resource scheduling, Hadoop YARN, responsiveness, fairness, Map Reduce

PDF Full Text Request

Related items

1	Research On SLA-Aware Energy-Efficient Scheduling Strategy For Hadoop Yarn
2	Research On The Energy-Efficient Hadoop YARN Resource Scheduling Strategy Based On State Matrix
3	Research And Improvement Of Hadoop YARN Resource Allocation Mechanism
4	The Design And Implementation Of Dynamic Resource-aware Scheduling Algorithm On Yarn
5	Research On Resource Allocation And Scheduling In Hadoop YARN
6	Research And Application Of Resource Scheduling Algorithm In Hadoop
7	Design And Implementation Of YARN Resource Scheduling Strategy Optimization Method
8	Research And Implementation Of High Concurrent Opportunistic Resource Allocation In Hadoop YARN
9	Analysis Of Service Scheduling And Resource Scheduling Based On Cloud Computing
10	Design And Implementation Of The Hadoop Platform Benchmark Suite