Font Size: a A A

Research On Scheduling Optimization In Heterogeneous Hadoop Clusters Based On Dynamically Adjusting Node Resource

Posted on:2016-06-23Degree:MasterType:Thesis
Country:ChinaCandidate:C C YinFull Text:PDF
GTID:2348330479453367Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
The rapid development of e-commerce,finance and social network brings unprecedented rapid growth of data-scale.Efficiently storing and processing large-scale data has become a hot spot in the field of Internet.As the open source implementation of the GFS and MapReduce model,Hadoop is a distributed computing and storage platform.Now there are many clusters are heterogeneous.Traditional scheduling strategies work inefficiently in heterogeneous clusters.Through the research of Hadoop scheduling,we propose a self-adpative scheduler for heterogeneous clusters.Firstly,we propose a dynamically adjusting node resourses based algorithm for heterogeneous clusters.In heterogeneous clusters,nodes has different resourses and different capacity of running tasks.Hadoop uses slot to descirpt resourse,every task needs to get a slot to run,configurating stable slot number is not suitable for heterogeneous clusters.Algorithm gets the running state of jobs and analyses node capabilities,then adjusts the slot number of nodes. Moreover,algorithm divides jobs into CPU-intensive and I/O-intensive and sets different slot number for them.Secondly, an optimization algorithm for short jobs is proposed.The two traditional job scheduling algorithms are FIFO and FAIR.However, these two scheduling algorithms ignore the differences of jobs.Node has different speed of running tasks in heterogeneous clusters,we can divide the slot into fast slot pool and slow slot pool,then preferentially allocat fast slot to interactive tasks and reserve some slots for them.This scheduling policy can shorten interactive job's completion time while don't affect the running of long batch job.Finally, we use HiBench to test under different scenarios,include the test with the Fair Scheduler and Coupling Scheduler.The results show our scheduling can self-adapt to the heterogeneous clusters based on characteristics of jobs and the differences between cluster nodes.Improved Scheduling can significantly reduce the task execution time of each node and improve the throughput and resource utilization of the cluster.
Keywords/Search Tags:Big Data, Hadoop, task scheduling, heterogeneous clusters
PDF Full Text Request
Related items