Font Size: a A A

A Priority-based Scheduling Algorithm For Hadoop

Posted on:2013-12-12Degree:MasterType:Thesis
Country:ChinaCandidate:F FanFull Text:PDF
GTID:2248330395951217Subject:Software and theory
Abstract/Summary:PDF Full Text Request
With the progress of science and technology, cloud computing is deeply root-ed among the people. The distributed platform based on cloud has become a hot spot in research. Hadoop is a framework that allows for the distributed processing of large data sets across clusters of computers using a simple programming model. It provides a set of open, stable and reliable dataflow. The Google MapReduce is achieved on Hadoop. Programs could be divided into large amounts of work units, and every unit could be executed on the nodes in the cluster. My research mainly focuses on the field of program scheduling. Hadoop has delivered capacity sched-uler, fair scheduler and HOD scheduler. Nowadays, there are two major directions in the research on Hadoop scheduling algorithms. One bases on the architecture of MapReduce which is trying to reach the goal of optimization through less data shuffling, less I/O throughput and time estimation. Another bases on the Hadoop fair scheduling algorithm which optimizes the strategy of scheduling.According to the deseriptions above, no scheduling algorithm with priority is involved. We develop a priority-based scheduling algorithm in order to reduce the average waiting time of high priority works in a priority work queue. We describe the definition of work priority and architecture of priority-based Hadoop MapReduce. The comparison of average waiting time of every priority level be-tween general scheduling algorithm and and priority-based scheduling algorithm is given through experiments.
Keywords/Search Tags:cloud computing, Hadoop MapReduce, scheduling algorithm, priori-ty
PDF Full Text Request
Related items