Research On Scheduling Optimization In Heterogeneous Hadoop Clusters Based On Dynamically Adjusting Node Resource

Posted on:2016-06-23

Degree:Master

Type:Thesis

Country:China

Candidate:C C Yin

Full Text:PDF

GTID:2348330479453367

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

The rapid development of e-commerce,finance and social network brings unprecedented rapid growth of data-scale.Efficiently storing and processing large-scale data has become a hot spot in the field of Internet.As the open source implementation of the GFS and MapReduce model,Hadoop is a distributed computing and storage platform.Now there are many clusters are heterogeneous.Traditional scheduling strategies work inefficiently in heterogeneous clusters.Through the research of Hadoop scheduling,we propose a self-adpative scheduler for heterogeneous clusters.Firstly,we propose a dynamically adjusting node resourses based algorithm for heterogeneous clusters.In heterogeneous clusters,nodes has different resourses and different capacity of running tasks.Hadoop uses slot to descirpt resourse,every task needs to get a slot to run,configurating stable slot number is not suitable for heterogeneous clusters.Algorithm gets the running state of jobs and analyses node capabilities,then adjusts the slot number of nodes. Moreover,algorithm divides jobs into CPU-intensive and I/O-intensive and sets different slot number for them.Secondly, an optimization algorithm for short jobs is proposed.The two traditional job scheduling algorithms are FIFO and FAIR.However, these two scheduling algorithms ignore the differences of jobs.Node has different speed of running tasks in heterogeneous clusters,we can divide the slot into fast slot pool and slow slot pool,then preferentially allocat fast slot to interactive tasks and reserve some slots for them.This scheduling policy can shorten interactive job's completion time while don't affect the running of long batch job.Finally, we use HiBench to test under different scenarios,include the test with the Fair Scheduler and Coupling Scheduler.The results show our scheduling can self-adapt to the heterogeneous clusters based on characteristics of jobs and the differences between cluster nodes.Improved Scheduling can significantly reduce the task execution time of each node and improve the throughput and resource utilization of the cluster.

Keywords/Search Tags:

Big Data, Hadoop, task scheduling, heterogeneous clusters

PDF Full Text Request

Related items

1	SLA-based Adaptive Job Scheduling In Heterogeneous Hadoop Clusters
2	Study On Computing Task Scheduling Optimization Based On Hadoop Job
3	Optimization And Research Of Hadoop Scheduling Algorithm In Hadoop Heterogeneous Environment
4	A Resource Management Strategy Based On Data Compression Ratio For MapReduce On Hadoop Clusters
5	Research And Implementation Of Resource Scheduling Algorithm In Hadoop Heterogeneous Cluster
6	Research And Improvement Of Task Scheduling Algorithm In Hadoop
7	Research And Implementation Of Resource Scheduling Algorithm Based On Hadoop Heterogeneous Cluster
8	Video Transcoding And Optimization Of Heterogeneous Distributed Clusters
9	Research On Task Scheduling Algorithms In MapReduce Clusters
10	Research On Task Scheduling Algorithms Based On Pre-Release Resource List In Hadoop