Font Size: a A A

Research And Implementation Of Scheduling Algorithm Based On MapReduce Cluster

Posted on:2016-04-07Degree:MasterType:Thesis
Country:ChinaCandidate:Z Z SunFull Text:PDF
GTID:2358330479955441Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Modern internet applications has led to a demand for massively parallel data processing, more and more computing tasks have to be completed on the devices which have tens of thousands of independent computing nodes. So Hadoop MapReduce is born at the right moment as a new generation of programming model,which is designed to deal with massively distributed data sets. The biggest advantage of this model is to achieve large-scale parallel computing. MapReduce cluster(also called Hadoop platform) is a kind of environment in which multi-user, multi-job and multi-task share the same physical resources. The scheduling algorithms determine the clustering performance, resource utilization and user's experience to a large extent.Therefore, the studies of scheduling algorithms based on the Hadoop platform have important theoretical value and practical significance.This paper firstly studies the job scheduling problems and the current scheduling algorithms under the cloud environment, and then focuses on the research and analysis of the job operating mechanisms, the job scheduling mechanisms and the several current scheduling algorithms under the Hadoop platform, including the idea,the steps and the advantages and disadvantages of the algorithms and so on. On this basis, this paper also conducts further analysis of the existing problems of the current scheduling algorithms under the Hadoop platform from the aspects of quality of service, data locality and resource utilization, and then proposes a two-level scheduling model based on the idea of game theory, namely, job-level scheduling and task-level scheduling. For job-level scheduling, this paper abstracts it into a dynamic non-cooperative game and proposes a job selection method based on the QoS bidding model in order to achieve the optimized assessment of the job priority and the quantitative selection of the job. For task-level scheduling, this paper abstracts it into a cooperative game and proposes an improved task scheduling algorithm based on Hungary algorithm for minimizing the job completion cost and reduce the job response time. In addition, an improved task scheduling algorithm based on the minimum cost flow is proposed, which not only reduces the response time of the job,but also improves data locality and achieves load balance.Finally, by programming and building the cluster environment, simulation experiments of the improved algorithms proposed by this paper are conducted and theexperimental results are compared and analyzed from the following three aspects of data locality, job response time and load balance, which verifies the validity and effectiveness of the improved algorithms.
Keywords/Search Tags:Hadoop, MapReduce, Job scheduling, Bidding model, Hungary algorithm, Minimum cost flow
PDF Full Text Request
Related items