The Research And Optimization Of JOB Schedule Algorithm In Hadoop

Posted on:2017-09-11

Degree:Master

Type:Thesis

Country:China

Candidate:L Bao

Full Text:PDF

GTID:2348330503492871

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of the Internet industry, the thorough technology of Big Data is extensively adopted. As a major parallel computing platform of massive data, Hadoop is facing more stringent test in terms of performance. The scheduler of hadoop is in charge of scheduling job and resource, whose algorithm determines the performance of the cluster. Hence it is crucial to study and optimize the Job Schedule Algorithm of Hadoop.This thesis carrys out the research on job scheduling algorithms and the main work is as follows:1. The current deadline-based job scheduling algorithms don't work on Hadoop YARN. This paper proposes a job scheduling algorithm based on YARN architecture to maximum the number of jobs which meet the limitations of response time. The algorithm can predict the minimum number of concurrent tasks to ensure the job response time and divide the resoucre according to this value. By take different scheduling methods for the two kinds of the resources, the algorithm maximize the efficiency of resource use without jeopardizing the time limit of jobs. Experiments show that this algorithm can effectively improve the number of jobs completed before the deadline.2. In order to optimize the network overhead generated during the Shuffle phase, this paper proposes a Reduce task scheduling algorithm. By analyzing the impact of data locality and the amount of data transferred on the network overhead of Shuffle stage,an index is proposed to measure the amount of network overhead generated by a Reduce task. Algorithm select the nodes to run this Reduce tasks based on this index. In order to avoid the node overload caused by the tilt of Reduce tasks,the algorithm provides a selection scheme of spare nodes. Experiments show that the algorithm can effectively reduce the network overhead caused by the operation in the Shuffle stage. The algorithm can effectively shorten the response time for a job with larger proportion of time in the Reduce stage.

Keywords/Search Tags:

Hadoop, YARN, Job Scheduling

PDF Full Text Request

Related items

1	Research On SLA-Aware Energy-Efficient Scheduling Strategy For Hadoop Yarn
2	Research On The Energy-Efficient Hadoop YARN Resource Scheduling Strategy Based On State Matrix
3	Research And Implementation Of Highresponsive Hadoop Computing Resource Scheduler Based On YARN
4	Design And Implementation Of YARN Resource Scheduling Strategy Optimization Method
5	Research And Implementation Of High Concurrent Opportunistic Resource Allocation In Hadoop YARN
6	Research And Implementation Of Hadoop Job Scheduling Algorithm In Heterogeneous Environment
7	The Research And Optimization Of JOB Schedule Algorithm In Hadoop
8	Research On Resource Allocation And Scheduling In Hadoop YARN
9	Based On Improved Hadoop Yarn Scheduler Design And Implementation Of Large Data Supporting Platform
10	Design Of Mapreduce Task Scheduling Algorithms In Heterogeneous Hadoop Cluster