Font Size: a A A

Research And Implementation Of Highresponsive Hadoop Computing Resource Scheduler Based On YARN

Posted on:2017-04-28Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiuFull Text:PDF
GTID:2308330509457493Subject:Computer technology
Abstract/Summary:PDF Full Text Request
As Big Data processing technologies are applied to more and more fields, the responsiveness of Big Data processing platform has become a focus. As the most popular open-source Big Data processing platform, Hadoop 2.0 adopts YARN to manage computing resources, which improves the resource utilization and makes mutiple programming frameworks available on Hadoop. But the current resource scheduling policies in YARN are designed for batch processing systems, making the responsiveness of Hadoop YARN platform unadequate. Therefore, in this dissertation, we study the high-responsive computing resource scheduler on Hadoop YARN.This dissertation presents the FSPY(Fair Sojourn Protocol in YARN) scheduler. We inherite the idea of size-based scheduling and make several improvements and extentions according to the resource management pattern and application properties of Hadoop YARN. FSPY gives the highest priority to the job that would have smallest virtual size. Here, job size is the definite integral of job allocated resources in its life cycle, and job virtual size refers to job’s remaining size(under a virutal Fair scheduler) at some time. To track jobs’ virtual size, we design a job size prediction module and a job virtual size calculation module. In the job size prediction module, we present a new mechanism for job size prediction with probing job and regression analysis. In the job virtual size calculation module, we propose a series of simple but effective algorithms. Furthermore, to alleviate the effect of job size prediction latency, we schedule jobs by the fair policy until their sizes are predicted.The performances of the Map Reduce job size prediction module and the whole FSPY scheduler are assessed through a series of realistic experiments. The job sets and data sets used in our experiments are synthetized by SWIM based on Fack Book 2009 datacenter trace. Experimental results show that our prediction approach reaches an R2 prediction accuracy of 0.97, which is sufficient to most size-based scheduling algorithms; when the cluster runs under a light workload, FSPY scheduler has a similar responsiveness with the Fair scheduler; when the cluster runs under a heavy workload, FSPY scheduler significantly outperforms the Fair scheduler in terms of responsiveness; FSPY scheduler also guarantees the fairness among jobs as the job slowdown is restricted to a narrow scope. In general, FSPY scheduler effectively improves the responsiveness of Hadoop YARN platform, and guarantees the fairness in terms of job response time, which makes this scheduler is valuable in practice.
Keywords/Search Tags:computing resource scheduling, Hadoop YARN, responsiveness, fairness, Map Reduce
PDF Full Text Request
Related items