Font Size: a A A

Design And Implementation Of Hadoop Resource-aware Scheduler

Posted on:2019-02-05Degree:MasterType:Thesis
Country:ChinaCandidate:B B CaiFull Text:PDF
GTID:2428330566493630Subject:Engineering
Abstract/Summary:PDF Full Text Request
The job scheduler is one of the core modules of the distributed file system.Its performance will greatly affect the resource utilization and the overall performance of the distributed cluster.The existing job schedulers in Hadoop distributed systems only focus on a single technical indicator,which makes the system unable to allocate computing resources(such as CPU resources and memory resources)for various customer needs in a timely manner.In this regard,Hadoop resource-aware scheduler is proposed in this paper,especially for the purpose of rational scheduling and allocation of resources.It optimizes the queue management and job management of Hadoop distributed system,designs and implements a Hadoop job scheduler with resource awareness.On the one hand,the Hadoop resource-aware scheduler addresses the shortcomings of current Hadoop distributed systems in managing jobs in a queue manner.It considers the needs of users and divides the queues into three types: job queues with large resource requirements,and normal demand job queues,and job queues with fewer resource requirements.By comparing the resource requests of the job and the average available resources of the node list,the job is assigned to the corresponding job queue,thereby completing the rational division of the queue and the scientific management of the job.On the other hand,the Hadoop resource-aware scheduler considers the supply of cluster resources and selects the amount of CPU resources for each node of the cluster.It proposes a node partitioning algorithm that uses CPU resources as the main comparison standard,and uses three types of list management to have different CPU resources.The type of node list,that is,the list of nodes with large CPU resources,the list of nodes with normal CPU resources,and the list of nodes with less CPU resources,correspond to three types of job queues.After resources and jobs are divided and managed using three types of queues,the Hadoop resource-aware scheduler only needs to allocate resources for jobs in the matching job queues and node lists,thereby more efficiently allocating resources to users.When designing a new job scheduling algorithm,the Hadoop resource-aware scheduler puts the problem of algorithm complexity in the first place,and prioritizes a job scheduling algorithm with a low complexity of the optimization algorithm.The new scheduling algorithm will preferentially search the pending applications in the queue corresponding to the node list when the nodes update the computing resources.When there is an unprocessed application,the node will preferentially process the jobs in the queue,otherwise the node will temporarily provide computing resources for jobs in other queues.By comparing the actual Hadoop distributed environment with the three Hadoop system native schedulers,the results show that the Hadoop resource-aware scheduler can take into account the throughput rate,the average execution time of the job,and completion time of the job and other technical indicators compared to the Hadoop distributed system native scheduler.In the handling of small operations,Hadoop resource-aware scheduler an reduce the total task completion time,also have a higher efficiency in reading and writing.
Keywords/Search Tags:Hadoop YARN, Job scheduler, Queue management, Resource awareness
PDF Full Text Request
Related items