Font Size: a A A

Research On Job Scheduling Algorithm In Hadoop

Posted on:2018-02-10Degree:MasterType:Thesis
Country:ChinaCandidate:S ZhangFull Text:PDF
GTID:2348330533463241Subject:Engineering
Abstract/Summary:PDF Full Text Request
Hadoop is a distributed computing solution,along with the development of Internet,the rise of a new generation of digital rise,people's consumer lifestyle is also changing,e-commerce and social networking began to become an important part of people's lives,and at the same time a huge amount of data also with the development of the field out.On the Hadoop platform,you can write and run distributed applications for handling large-scale data.In this paper,the research status of Hadoop scheduling algorithm is analyzed in depth,and the Hadoop scheduling algorithm is analyzed and studied in view of the low efficiency of Hadoop scheduling algorithm,low resource utilization rate and non-well suited heterogeneous environment.Firstly,the advantages and disadvantages of Hadoop native algorithm FIFO and job classification algorithm are analyzed in detail,and the problem of low utilization of single queue resource and non-adaptable environment is improved,and the scheduling based on multi-queue and rotation is proposed algorithm.The algorithm divides the job into different queues,and then assigns different time slices according to the deadline of the job to perform jobs in different queues.Thus improving the cluster efficiency and resource utilization.Secondly,the advantages and disadvantages of the Hadoop native algorithm Fair and the task-based algorithm are analyzed in detail,and the problem of "hunger" and low system resource utilization for small job tasks are improved.This paper is based on the task time and exponential smoothing scheduling algorithm.The algorithm calculates the execution rate of the job according to the historical progress of the task under the cluster,calculates the next execution progress rate of the index by the exponential smoothing,dynamically estimates the Map phase of the job and the remaining execution time of the Reduce phase,and then through the Map phase and the Reduce stage Of the difference analysis,the operation of the different sort.The algorithm solves the problem of resource efficiency.Finally,the Hadoop operating environment is simulated by experiment,and the effectiveness of MQWR algorithm and TMF algorithm are analyzed respectively.Andcompares the comparison experiment with Fair scheduling algorithm and Task Schedule Deadline scheduling algorithm to illustrate the effectiveness of the proposed algorithm.
Keywords/Search Tags:resource utilization, cluster efficiency, multi-queue, rotation scheduling, exponential smoothing, Hadoop
PDF Full Text Request
Related items