Research On Optimization Of Mapreduce Job Scheduling Technology

Posted on:2016-03-30

Degree:Master

Type:Thesis

Country:China

Candidate:D Q Liang

Full Text:PDF

GTID:2348330503977884

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of the Internet, data, which has penetrated into every area of today's industries and business areas, has become an important factors of production. The amount of data generated by the Internet daily has gone far beyond the carrying capacity of existing IT infrastructure; also the requirement of real-timing has been beyond the existing computing power. As a cloud computing data processing system, Hadoop uses the idea of data parallel computing to process large data and has been widely applied in many fields. Most existing MapReduce job schedulers do not consider deadline reqirements of jobs, leading to a part of the jobs can not be completed in time; besides, most job schedulers use a "best effort" policy in job's localized execution, leading to The result that the job set is not able to take full advantage of data locality and network transmission cost becomes a bottleneck of efficiency. In addition, most job schedulers do not consider the heterogeneity of clusters and are not able to select the reasonable compute nodes to run jobs based on local conditions, resulting in jobs' inefficient execution.In response to these problems, in this paper, we proposed a 2-tier scheduling algorithm MCF aimed at improving the execution efficiency of data-parallel jobs.In the first level scheduling, MCF establishes a multi-user waiting queue, pre-assigns resources (storage/compute/bandwidth) for jobs based on deadlines, estimates the remaining time of jobs, minimizes the average delay time and provides basis for more fine-grained tasks assignment;In the second level scheduling, MCF combines tasks into task groups to accelerate the scheduling efficiency based on the data block's location information.Then MCF establishes a waiting time model and an execution time model for jobs considering data locality and cluster heterogeneity.In the end, MCF generates scheduling sequences for tasks using a strategy based on the minimum cost flow, trying to reduce the average response time of the whole job set.We designed and implemented the MCF scheduling algorithm on the high performance computing center to achieve the above target. The experiment results shows that, MCF can effectively reduce the average response time of job set, decrease the average delay time, and has certain performance advantages compared with FIFO, Capacity and Fair Scheduler.

Keywords/Search Tags:

big data processing, MapReduce, job scheduling, minimum cost flow

PDF Full Text Request

Related items

1	Research And Implementation Of Scheduling Algorithm Based On MapReduce Cluster
2	Several Research And Analysis Base On Min-cost Max-flow Algorithm
3	Research On Efficient Task Partition And Scheduling In MapReduce Data Processing System
4	The Research Of Job Scheduling Algorithm In Mapreduce-styled Massive Data Processing Platform
5	Research On Task Allocation Based On Minimum Cost Flow Algorithm In Mobile Crowd Sensing
6	The multiobjective average network flow problem: Shortest path and minimum cost flow formulations, algorithms, heuristics, and complexity
7	The Research Of The Maximum Flow And The Minimum Cost Algorithm
8	MapReduce Job Oriented Collaborative Optimization On Cloud Data Center Network Resource
9	A Permissible Edge Algorithm For Solving Minimum Cost Flow By A Dual Approach
10	Study On Resource Context And Job Cost-Aware Job Scheduling Optimization For Hadoop Mapreduce Framework