Font Size: a A A

The Design And Implementation Of Dynamic Resource-aware Scheduling Algorithm On Yarn

Posted on:2018-01-23Degree:MasterType:Thesis
Country:ChinaCandidate:Q WangFull Text:PDF
GTID:2348330518495287Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Big data is becoming more and more prominent in industry. Hadoop,as a basic big data platform, has been used as the important basic software in the big data environment by several big IT companies around the world.Yarn, as the resource management and job scheduling mechanism in Hadoop, adopts the specific scheduling alogrithm to allocate resources and schedule tasks for the applications submitted by the users. Whether the scheduling policy is good or bad will directly affect the cluster's performance and the user experience. Therefore, the study and the improvement of Yarn scheduling algorithm have an important practical significance.Yarn comes with three kinds of scheduling algorithms at present.Based on the in-depth study of the three build-in scheduling algorithms and their respective implementations, this paper presents a new scheduling alogrithm, named the PFT(Performance Fairness Tradeoff), to overcome the defects of the three build-in shceduling algorithms and the existing algorithms. The PFT algorithm balances the performance of cluster and the fairness of resource allocation, and shortens the execution time of the applications. In this paper, the main research results are as follows.(1) Detailedly analyzes the related technologies of Hadoop Yarn,including HDFS?MapReduce and Yarn.(2) In-depth studies the three build-in scheduling algorithms and their respective implementations of Yarn from the point of view of the source code. Analyzes the advantages and disadvantages of the three build-in scheduling algorithms.(3) Presents a new scheduling algorithm, named PFT(Performance Fairness Tradeoff), to overcome the defects of the three build-in algorithms and the exsiting algorithms. The PFT algorithm can balance the performance of cluster and the fairness of resource allocation, and shorten the execution time of the applications.(4) Develops a scheduler based on the proposed PFT algorithm, and configures the scheduler to hadoop cluster.(5) Builds a stable hadoop cluster, and compares the PFT algorithm with the three build-in algorithms by considering the cpu usage, the memory usage, and the applications' execution time of the cluster. The experimental results show that the proposed PFT scheduling algorithm has improved significantly in cluster's performance and the fairness of resource allocation.
Keywords/Search Tags:yarn, job scheduling, clustering performance, fairness of resource allocation, balance
PDF Full Text Request
Related items