Font Size: a A A

Research On Optimization And Improvement Of MapReduce Job Scheduling Algorithm

Posted on:2015-02-22Degree:MasterType:Thesis
Country:ChinaCandidate:B WanFull Text:PDF
GTID:2268330428470018Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the swift and violent development of internet technology, the IT industry has been gradually tended to consumerization and socialization. The increasingly accumulative big data has inaugurated a brand new era of computing, and cloud computing brings enormous opportunities and challenges. Being motivated by the three epochal papers of Google about cloud computing and Apache community, Hadoop gradually developed into the most widely used open source cloud computing platform. As one of the core technologies in Hadoop, MapReduce framework and its job scheduling algorithm have a significant influence on performance of the entire system, and data locality determines the quality of scheduling algorithm. Focusing on solving the data locality issue of native MapReduce job scheduling algorithms, the rest of this thesis is organized as follows.First of all, the basic architecture of Hadoop and the related technologies about MapReduce are discussed and analyzed in the first part, especially centered on the introduction and analysis of the basic principles, job processing mechanism and job scheduling mechanism of MapReduce.Secondly, the current investigation situation of the native MapReduce job scheduling algorithms and their improved algorithms are expounded in detail, and the advantages and disadvantages of these algorithms are summarized. What’s more, the existing job scheduling algorithms about data locality are researched and analyzed exclusively, and to provide a basis for the follow-up research, this part summarizes the defects of current job scheduling algorithms in data locality.Then, to address the problem that the existing Hadoop job scheduling algorithms cannot guarantee good data locality, combining with Data Prefetching technology, a Hadoop MapReduce job scheduling algorithm based on resource prefetching is proposed and implemented. Before the non-local map task being assigned, by prefetching its input data to the disk of TaskTracker which it will be assigned on, the data locality of job will be promoted greatly at a certain cost of network and disk space and thus the job executes more efficiently.Finally, by setting up one small experimental Hadoop cluster and designing experimental scene, the improved job scheduler and the existing three Hadoop job scheduler are configured to work on the cluster respectively and are compared with experiments. The experimental results indicate that the proposed algorithm improves data locality effectively and to some extent reduces the job execution time, and achieves a higher global performance.
Keywords/Search Tags:Cloud Computing, Hadoop, MapReduce, Job Scheduling, DataLocality, Resource Prefetching
PDF Full Text Request
Related items