Font Size: a A A

Research On Data Locality Of Hadoop Task Scheduling

Posted on:2015-11-23Degree:MasterType:Thesis
Country:ChinaCandidate:K ZhouFull Text:PDF
GTID:2308330452957191Subject:Computer technology
Abstract/Summary:PDF Full Text Request
At present, large data processing system uses Hadoop platform based on MapReduceframework to process and analysis data. Job scheduling should choose an appropriatetask belonging to an appropriate job. When scheduling task, data localization can reducenetwork overhead and improve performance. But there are shortcomings for the existingtask scheduling method, therefore, it is important to research the Hadoop task schedulingmethod based on data localization.Firstly, the existing problems of the task scheduling method is analyzed. Then, theoverall structure of the preemptive scheduling method is designed, and four functionmodules are introduced, that is, localization checking, the remaining time estimation of thetask, killing and starting the task, and clearing the start information of the task.In the implementation of each module, the part of localization checking of the task,gives the algorithm calculating the distance between two nodes and the algorithmchecking localization of the task; the part of the remaining execution time estimation ofthe task, gives the remaining execution time estimation of the task which has been startedor prepares to be started; the part of killing and starting the task, introduces how to kill arunning task and restart a new task; the part of clearing the start information of the task,introduces how to remove the start information of the task.The experimental results show that, when the nodes in the cluster are in the samerack, if there are some nodes calculating and transmitting data slowly, the preemptive taskscheduling method can effectively reduce the running time. When the nodes are not in thesame rack, copying the input data of the non-local task will bring greater networkoverhead, The experimental results should be more obvious. The preemptive taskscheduling method can improve the performance of Hadoop in a certain extent.
Keywords/Search Tags:Hadoop, Task Scheduling, Data Locality, Rack-off
PDF Full Text Request
Related items