Font Size: a A A

Researches On Mapreduce Offline Scheduling Algorithm In Heterogeneous Environment

Posted on:2015-08-12Degree:MasterType:Thesis
Country:ChinaCandidate:H ChenFull Text:PDF
GTID:2298330452964005Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Thebigdataiscomingnow,andhowtoefcientlyhandlemassivedatahasbecomea new challenge in computer science. MapReduce is a new computing model whichaims to solve this kind of problem. Compared to the previous models, MapReduce iseasy to learn and use, and its performance is efcient and stable. Our research wants tosolve the ofine scheduling problem for MapReduce in a heterogeneous environment.As we all know, for most companies that have Hadoop clusters, a large part of jobsare daily processing jobs which are performed periodically. For example, Google dailyanalyzers new crawled pages and users’logs. Obviously, how to schedule these jobsis important since a well-designed execution order will reduce the total running timefor the jobs and make the system release resources earlier.Our work can be briefy summarized as follows: In a heterogeneous environmen-t, for a given set of independent MapReduce jobs, design a scheduling algorithm tominimize the makespan of all jobs. By surveying the literatures, we fnd the problemis NP-hard. We compare this problem with the two-stage fexible fow shop problem-s creatively and propose a heuristic scheduling algorithm called Hybrid MultistageScheduling algorithm. In our work, we tries to solve the scheduling problem by break-ing down it into two-subproblems: sequencing and dispatching. For sequencing, wedevelop a heuristic based on Johnson’s algorithm which considers the precedence re-lationships between map and reduce tasks. For dispatching, we split it into two stages.During map stage, we employ the Min-Min algorithm to balance the load of machines.During reduce stage, we propose Dyanmic-Min-Min algorithm which will maintain aset contains all arrived jobs and dispatch jobs from this set. By this way, the reducetasks can been uniformly assigned to the cluster. To evaluate the performance of our algorithm, we designed a MapReduce simula-tor. Our simulation results on two kinds of workloads demonstrate that every heuristicemployed in our algorithm contributes to reducing the makespan. As a whole, it im-proves the performance ranging from51%to77%compared to FIFO.
Keywords/Search Tags:task scheduling, MapReduce, makespan, heteroge-neous system, heuristic algorithm
PDF Full Text Request
Related items