Research On Optimization And Improvement Of MapReduce Job Scheduling Algorithm

Posted on:2015-02-22

Degree:Master

Type:Thesis

Country:China

Candidate:B Wan

Full Text:PDF

GTID:2268330428470018

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the swift and violent development of internet technology, the IT industry has been gradually tended to consumerization and socialization. The increasingly accumulative big data has inaugurated a brand new era of computing, and cloud computing brings enormous opportunities and challenges. Being motivated by the three epochal papers of Google about cloud computing and Apache community, Hadoop gradually developed into the most widely used open source cloud computing platform. As one of the core technologies in Hadoop, MapReduce framework and its job scheduling algorithm have a significant influence on performance of the entire system, and data locality determines the quality of scheduling algorithm. Focusing on solving the data locality issue of native MapReduce job scheduling algorithms, the rest of this thesis is organized as follows.First of all, the basic architecture of Hadoop and the related technologies about MapReduce are discussed and analyzed in the first part, especially centered on the introduction and analysis of the basic principles, job processing mechanism and job scheduling mechanism of MapReduce.Secondly, the current investigation situation of the native MapReduce job scheduling algorithms and their improved algorithms are expounded in detail, and the advantages and disadvantages of these algorithms are summarized. What’s more, the existing job scheduling algorithms about data locality are researched and analyzed exclusively, and to provide a basis for the follow-up research, this part summarizes the defects of current job scheduling algorithms in data locality.Then, to address the problem that the existing Hadoop job scheduling algorithms cannot guarantee good data locality, combining with Data Prefetching technology, a Hadoop MapReduce job scheduling algorithm based on resource prefetching is proposed and implemented. Before the non-local map task being assigned, by prefetching its input data to the disk of TaskTracker which it will be assigned on, the data locality of job will be promoted greatly at a certain cost of network and disk space and thus the job executes more efficiently.Finally, by setting up one small experimental Hadoop cluster and designing experimental scene, the improved job scheduler and the existing three Hadoop job scheduler are configured to work on the cluster respectively and are compared with experiments. The experimental results indicate that the proposed algorithm improves data locality effectively and to some extent reduces the job execution time, and achieves a higher global performance.

Keywords/Search Tags:

Cloud Computing, Hadoop, MapReduce, Job Scheduling, DataLocality, Resource Prefetching

PDF Full Text Request

Related items

1	The Research Of MapReduce Job Scheduling Algorithm Based On The Hadoop Platform
2	Credibility-based Resource Scheduling Strategy Under Cloud Platform
3	Research On Scheduling Algroithm In Hadoop Mapreduce
4	Research And Improvement Of Job Scheduling Algorithm Based On Hadoop
5	Research And Improvement Of Resource Scheduler Algorithm Based On Hadoop
6	Research And Improvement Of MapReduce Scheduling Mechanism On Cloud Computing
7	The Mapreduce Model In The Hadoop Implementation Of Performance Analysis And Optimization Improvements
8	Study On Hadoop Resource Scheduling Strategy Based On IaaS Cloud Platform
9	The Research And Implementation Of Hadoop Scheduling Algorithm
10	Research On Algorithm Analysis And Modificating Of Job Scheduling For Hadoop