Font Size: a A A

Research And Improvement Of Resource Scheduler Algorithm Based On Hadoop

Posted on:2016-01-28Degree:MasterType:Thesis
Country:ChinaCandidate:S H AnFull Text:PDF
GTID:2308330476453442Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the advent of the era of big data, cloud computing technology has been developed extremely fast. Meanwhile, Hadoop platform, acts as the central role to achieve cloud computing technology, has been widely used and developed. Hadoop platform forms a cluster by connecting a large number of computers through the network interconnection. Users can submit jobs to the cluster by the client to accomplish the practice application requirements. Resource Scheduler is the core component of Hadoop platform. It performs resource allocation for user-submitted jobs and accomplishes job scheduling by different scheduling algorithms. Thus, the scheduling algorithms directly affect the performance of the entire cluster, determine the quality of services for users. So research on Hadoop platform job scheduling algorithm is significant important.The main research content of this paper is job scheduling algorithm improvements based on Hadoop YARN. We propose a novel algorithm, called the Dependency Scheduler, by analyzing the insufficiency of the three Hadoop YARN built-in job scheduling algorithm. This algorithm mainly optimizes the Hadoop cluster’s resource utilization, and emphatically considers the dependency between map and reduce tasks of MapReduce job. It implements the scheduling process by two stages, called the initial phase and real-time phase. Finally it achieves the target of the optimal use of cluster resources and minimum completion time of the whole MapReduce job set.Finally, we set up Hadoop cluster independently to verify the performance of the proposed algorithm compared with the existing three algorithms. We choose four benchmarks as the experiment’s workloads. In the experiment, we take use of three characteristic quantities, named the node CPU usage, memory usage node and makespan of the whole job set. The experimental results demonstrate that Dependency Scheduler gets improvement both in cluster resource utilization and completion time of Map Reduce job set.
Keywords/Search Tags:cloud computing, hadoop, mapreduce, resource scheduling algorithm
PDF Full Text Request
Related items