Font Size: a A A

The Research And Optimization Of JOB Schedule Algorithm In Hadoop

Posted on:2017-09-11Degree:MasterType:Thesis
Country:ChinaCandidate:L BaoFull Text:PDF
GTID:2348330503492871Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet industry, the thorough technology of Big Data is extensively adopted. As a major parallel computing platform of massive data, Hadoop is facing more stringent test in terms of performance. The scheduler of hadoop is in charge of scheduling job and resource, whose algorithm determines the performance of the cluster. Hence it is crucial to study and optimize the Job Schedule Algorithm of Hadoop.This thesis carrys out the research on job scheduling algorithms and the main work is as follows:1. The current deadline-based job scheduling algorithms don't work on Hadoop YARN. This paper proposes a job scheduling algorithm based on YARN architecture to maximum the number of jobs which meet the limitations of response time. The algorithm can predict the minimum number of concurrent tasks to ensure the job response time and divide the resoucre according to this value. By take different scheduling methods for the two kinds of the resources, the algorithm maximize the efficiency of resource use without jeopardizing the time limit of jobs. Experiments show that this algorithm can effectively improve the number of jobs completed before the deadline.2. In order to optimize the network overhead generated during the Shuffle phase, this paper proposes a Reduce task scheduling algorithm. By analyzing the impact of data locality and the amount of data transferred on the network overhead of Shuffle stage,an index is proposed to measure the amount of network overhead generated by a Reduce task. Algorithm select the nodes to run this Reduce tasks based on this index. In order to avoid the node overload caused by the tilt of Reduce tasks,the algorithm provides a selection scheme of spare nodes. Experiments show that the algorithm can effectively reduce the network overhead caused by the operation in the Shuffle stage. The algorithm can effectively shorten the response time for a job with larger proportion of time in the Reduce stage.
Keywords/Search Tags:Hadoop, YARN, Job Scheduling
PDF Full Text Request
Related items