Font Size: a A A

Research On Hadoop Cluster Scheduling Optimization

Posted on:2016-10-25Degree:MasterType:Thesis
Country:ChinaCandidate:C HeFull Text:PDF
GTID:2208330461986024Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The scalability of Cloud infrastructures has significantly increased their applicability. Hadoop, which works based on a Map Reduce model, provides for efficient processing of Big Data. This solution is being used widely by most Cloud providers. Hadoop schedulers are critical elements for providing desired performance levels. A scheduler assigns Map Reduce tasks to Hadoop resources. There is a considerable challenge to schedule the growing number of tasks and resources in a scalable manner. Moreover, the potential heterogeneous nature of deployed Hadoop systems tends to increase this challenge.The original task scheduling algorithm of Hadoop cannot meet the performance requirements of heterogeneous clusters. This paper analyzes the performance of widely used Hadoop schedulers. Includes three job scheduling:FIFO and Fair sharing and COSHC(Classification and Optimization based on Scheduler for Heterogeneous Clusters). And a task scheduling algorithm:ATSDA(Adaptive Task Scheduling strategy based on Dynamic workload Adjustment). FIFO is the default Hadoop scheduler. It orders the jobs in a queue based on their arrival times, ignoring any heterogeneity in the system. Fair Sharing is a Hadoop scheduler introduced to address the shortcomings of FIFO, when dealing with small jobs and user heterogeneity. This scheduler defines a pool for each user, where each pool consists of a number of map and reduce slots on a resource. COSHC is a Hadoop scheduler which considers system and user heterogeneity in making scheduling decisions.Using the system information, COSHC classifies incoming jobs and finds a matching of the job classes to the resources based on the requirements of the job classes and features of the resources.Based on our insights, an Complex scheduling optimizer is introduced, which selects appropriate scheduling algorithms for scalable and heterogeneous Hadoop systems. The Complex scheduling optimizer introduced a hybrid solution in the job scheduling and ATSDA in the task scheduling. The hybrid solution is a combination of FIFO, Fair sharing and COSHC. The hybrid solution selects appropriate scheduling algorithms with respect to the number of incoming jobs and available resources. With ATSDA, Task Trackers can adapt to the change of load at runtime, obtain tasks in accordance with the computing ability of their own, and realize the self-regulation, while avoiding the complexity of algorithm, which is the prime reason to make Job Tracker the system performance bottleneck. Experimental results show that ATSDA is a highly efficient and reliable algorithm, which can make heterogeneous Hadoop clusters stable,scalable, efficient, and load balancing. Furthermore, its performance is superior to the original and improved task scheduling strategy of Hadoop, from the aspects of the execution time of tasks,the resource utilization, and the speed-up ratio. With ATSDA, Task Trackers can adapt to the change of load at runtime, obtain tasks in accordance with the computing ability of their own, and realize the self-regulation, while avoiding the complexity of algorithm, which is the prime reason to make Job Tracker the system performance bottleneck. Experimental results show that ATSDA is a highly efficient and reliable algorithm, which can make heterogeneous Hadoop clusters stable,scalable, efficient, and load balancing. Furthermore, its performance is superior to the original and improved task scheduling strategy of Hadoop, from the aspects of the execution time of tasks,the resource utilization, and the speed-up ratio.
Keywords/Search Tags:Hadoop cluster performance, MapReduce computing framework, Complex scheduling optimizer, Hybrid scheduling solution, ATSDA task scheduling
PDF Full Text Request
Related items