Scheduling Optimization Research For MapReduce

Posted on:2017-02-02

Degree:Master

Type:Thesis

Country:China

Candidate:S L Gao

Full Text:PDF

GTID:2308330485981022

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

Big data era comes with the Internet era. The efficient processing of large-scale data is of great significance to the production and practice and traditional computing model is unable to meet the requirements of mass data processing. Hadoop, composed with HDFS and MapReduce, was arising with the design ideas of GFS and MapReduce published by Google in 2004, which has been widely used in mass data processing. As the core component, MapReduce is responsible for distributed processing and the performance of it becomes a hot issue. Through a large number of researches and experimental results,we find that data locality is an important factor affecting the performance of MapReduce,which also affects the network bandwidth consumption of the cluster and the execution efficiency of the nodes. This thesis takes data locality as the entry point to enhance the execution efficiency.Data locality means computing happened on where data source located. This thesis deeply analyzes the scheduling mechanism of MapReduce and finds that the roughness rule of MapReduce selecting data blocks and computing nodes makes its low degree of data locality. This thesis highly abstracts task scheduling and resource distribution,proposes two efficient task scheduling algorithms Bolas and Bolas+ to solve the problem,meanwhile enhance the job execution efficiency.This thesis does a good innovation in MapReduce scheduling optimization. Bolas novelly abstracts task scheduling as weighted optimal bipartite graph matching, and creatively solves the mismatching problem of computing nodes and data blocks. Bolas+proposes innovative lightweight scheduling strategy based on data block marking. In Bolas+, data blocks and nodes are considered more refined while scheduling.Through a large number of experiments show that Bolas can elevate the data locality to a degree of 100%, and Bolas can reduce the total job execution time by up to 15%. The average data localization degree of Bolas+ can be maintained more than 95%, as the job becomes larger, localization tends to 100%, job execution efficiency averagely enhances15%. The computation complexity of Bolas is O(n3), when job becomes larger, the response time will be a burden. However, Bolas+ can solve this problem gracefully with a computation complexity of O(n/m), where n represents block number and m denotes the computing node number.

Keywords/Search Tags:

MapReduce, Data-locality, Task Scheduling, Bipartite graph matching, Data block marking

PDF Full Text Request

Related items

1	Research And Implementation Of A Logical-Block Affinity Scheduling For Big Data Analytics Systems
2	The Research On High Performance Task Scheduling Technology Based On Mapreduce In Cloud Computing
3	Heuristic Algorithm Based On Bipartite Graph Matching Multiprocessor Task Scheduling
4	Task Scheduling Research And Application Of Big Data In Distributed Environment
5	MapReduce Job Oriented Collaborative Optimization On Cloud Data Center Network Resource
6	Research Of Task Scheduling Algorithm For Big Data Based On Reinforcement Learning In Heterogeneous Environment
7	Research On Cloud Task Scheduling Algorithms Based On Mapreduce
8	The Research On Distributed Task Scheduling Algorithms Based On Hadoop Platform
9	Research And Implementation Of Local Priority Scheduling Algorithm Based On Mapreduce For Massive Data
10	Research On Spark Task Scheduling Technology Based On Execution Time Prediction