Font Size: a A A

The Research And Applications Of Job Scheduling Algorithm On Hadoop Platform

Posted on:2016-05-06Degree:MasterType:Thesis
Country:ChinaCandidate:J KangFull Text:PDF
GTID:2298330467490977Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the depth of the information process and digital equipment market boom, the amount of data of the global generated by the rapid growth, cloud computing has become an important model of the emerging technologies and services computing. Apache Hadoop as an open source cloud computing platform provides powerful data processing tools, has been widely applied and support in business. Job scheduling algorithms under the platform of Hadoop is responsible for dominating the computational resource and determining the job execution order, selecting the appropriate scheduling algorithm can accelerate the response time of operation effectively, and improve the computational efficiency of the whole cluster. Therefore, research and improve job scheduling algorithm is of great significance to improve overall performance of the platform.Through the study and analysis of existing platform of Hadoop scheduling algorithm, aiming at the problem of LATE scheduling algorithm doesn’t consider the cost of rescheduling the backup task and the static limitation on speculation of the task remaining time, we propose the LATE rescheduling algorithm based on rescheduling cost. The new improved algorithm mainly has two main parts:(1) based on the cost of rescheduling the task to decide whether to execute backup mechanism, through speculating the task rescheduling cost to adjust whether starting backup mechanism for backup task, and ensure that staring the backup mechanism can improve operational progress of the job;(2) changing parameters of computing task remaining time based on historical information of the work task tracker,adjusting the proportion of time in each stage of the task in real time to improve the estimation accuracy of the progress tasks and task remaining time. Finally, we set up the experimental Hadoop platform for several experiments comparing the average response time of different jobs by using the new algorithm and LATE scheduling algorithm, SAMR algorithm, the experimental results prove that the proposed LATE rescheduling algorithm based on rescheduling cost can effectively reduce the cost of operation response time and improve the efficiency of execution efficiency.
Keywords/Search Tags:Cloud compuing, Hadoop, LATE, job scheduling algorithm, rescheduling cost
PDF Full Text Request
Related items