Font Size: a A A

Research On Scheduling In Multi-MapReduce Job For Global Makespan Optimization

Posted on:2015-11-14Degree:MasterType:Thesis
Country:ChinaCandidate:H Z ZhangFull Text:PDF
GTID:2348330482452690Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of Internet and the arrival of data explosion, cloud computing has been developed rapidly. In cloud computing, MapReduce distributed computing framework has become a popular computing model to process large quantities of data, it allows the programmer can easily use distributed resources in the process of developing program to complete large-scale distributed computing effectively because of the parallelism and high scalability. In the MapReduce distributed computing framework, job scheduler adjusts the execution order and resource allocation rule of each job to make the same MapReduce job have different execution performance in various scheduling rules. So far there has been a lot of research work in job scheduling in order to ensure the performance of MapReduce.This thesis makes a detailed analysis of the current research work in MapReduce job scheduling. To optimize global completion time of MapReduce jobs which are I/O intensive and executed periodically, this thesis proposes a multi-MapReduce job algorithm based on jobs consolidation benefit analysis model. The algorithm measures I/O execution time by modeling I/O resource consumption in MapReduce job processing, and on the basis of it, gain of multi-MapReduce jobs consolidation and cost evaluation function are presented. Furthermore a mathematical model of multi-MapReduce job consolidation which can be combined to find the best way to achieve maximum benefits in job consolidation is proposed to solve the problem. Moreover, GMS algorithm is proposed based on MapReduce job performance evaluation model to solve the multi-MapReduce jobs that have no execution order constraints scheduling problem. It estimates the execution time of Map and Reduce tasks via modeling MapReduce job performance evaluation, thus the processes of MapReduce job are abstracted to traditional two-stage job shop scheduling, the thesis analyzes the shortcomings of traditional Johnson algorithm when it is used in the MapReduce job scheduling and proposes a strategy that divides the resources slots into pools to achieve the optimization of global completion time and improve the utilization of the cluster.Finally, this thesis built a distributed Hadoop cluster lab environment, verified the effectiveness of the proposed global makespan optimization multi-MapReduce job scheduling framework and made a detailed analysis of the results.
Keywords/Search Tags:Cloud computing, MapReduce, Input file sharing, scheduling model
PDF Full Text Request
Related items