Font Size: a A A

The Research Of Job Scheduling In Map/Reduce-styled Massive Data Processing Platform

Posted on:2015-03-24Degree:MasterType:Thesis
Country:ChinaCandidate:Y F WangFull Text:PDF
GTID:2298330452953414Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Map/Reduce-Styled data processing platform is the cutting-edge technology inmassive data processing field. Different from traditional platform,Map/Reduce-Styled platform has new feathers in simplified parallel programmingmodel, computing framework based on data locality principle and fine-grainedresource allocation.Job scheduling is one of the core components in Map/Reduce platform. Toensure platform resource shared fairly among multiple users and jobs runningefficiently, an efficient scheduler is required for management and dispatch platformresource. Map tasks and reduce tasks are not jointly optimized in existing scheduler,albeit there is a strong dependence between them. Different finished time of map tasksresult in idle time in reduce task, which leads to decrease of resource utilization andincrease of average job turnaround time.The main work in this paper focus on the preemptive scheduling for theMap/Reduce platform, which preempts resource hold by reduce tasks during idle time.To increase resource utilization and shorten job execution time, this paper analysis theissues of preemptive scheduling and suspend and reassign model of reduce task. Themain contribution of this paper is as follows:(1) The architecture of preemptive scheduler and task status model are designed.The preemptive scheduler using Master/Slave frame for separating the decision andexecution of suspending reduce task. Master/Slave architecture also makes it easy tointegrate preemptive scheduler to Map/Reduce platform. Task status model facilitatesthe management of lifecycle of tasks which defines the suspend status of reduce tasks.(2) To support the preemptive scheduler, the resource preemption model andreduce task reassign model are present. According to the progress of map tasks andreduce tasks, the resource preemption model and reduce task reassign model arepresented which aim to reduce the overhead of suspend and resume operation.(3) Preemptive scheduler is seamlessly integrated with exist scheduler ofMap/Reduce platform. Because of the short time of available preemptive resource,preemptive resource is assigned higher priority than regular resource. To lower theimpact of suspend and resume operation, preemptive resource can be assigned to maptasks only. (4) Integrated preemptive scheduler to Predoop, a Map/Reduce-Styled platform.Predoop implements the resource preemptive model, reduce task reassigned modeland preemptive job scheduling.(5) Experimental results show that Predoop (integrated with preemptive jobscheduling) outperforms Hadoop (an oper source Map/Reduce platform withFCFS(First Come Fist Service) job scheduling) on the average job turnaround time bythe maximum of14.55%. Predoop will significantly improves the efficiency of jobexecution when job contains many map tasks.
Keywords/Search Tags:Massive Data Processing, Map/Reduce-Styled Platform, PreemptiveScheduling
PDF Full Text Request
Related items