Research On Hadoop Scheduling Algorithm Based On Dynamic Resource Allocation In Heterogeneous Environment

Posted on:2022-12-23

Degree:Master

Type:Thesis

Country:China

Candidate:J Y Lin

Full Text:PDF

GTID:2518306773967959

Subject:Automation Technology

Abstract/Summary:

PDF Full Text Request

Hadoop is a distributed framework for processing big data in an efficient way,and is widely used in million-level data processing with high reliability,fault tolerance and scalability.Task scheduling,as a key technology of Hadoop,is mainly concerned with the problem of assigning computational tasks to specified nodes,and is of very significant research significance in terms of improving resource utilization,shortening task execution time and increasing throughput rate.In this paper,we investigate the resource allocation problem in Hadoop environment at the job level by proposing a dynamic scheduling strategy based on job classification and a load-based locality scheduling strategy for some important but not yet effectively solved problems,such as cluster heterogeneity,job locality and real-time.Finally,experiments are conducted to verify the effectiveness of the proposed method.The specific research work is as follows:A job classification-based scheduling strategy is proposed to address the heterogeneity of clusters and the real-time problem of jobs.First,two listeners are set up in the initial stage of the cluster to listen to job arrival messages and resource idle messages respectively.Then,the listeners process the messages according to the different message types and save the cluster resource usage for resource allocation.Finally,when the listener receives a job arrival message,it classifies the jobs according to their estimated completion time and job arrival rate,and assigns the classified jobs to the corresponding queues for scheduling.The experimental results show that this method can effectively reduce job execution time and CPU spending time,and make gains in improving the high performance of the cluster.To address the problem of data locality,a node load-based locality scheduling strategy is proposed.First,the concept of load ratio is proposed,and the load ratio indicates the cluster resource usage,and the remaining resources are calculated based on the load ratio.Then,for each job queue,the tasks in the queue are traversed,and if the task is a local task of the current node,the task is scheduled first.If the task is a non-local task,the information of the task is saved and the iterative process continues to the next task.Finally,when all local tasks are scheduled,the non-local tasks are assigned to nodes with lower load for execution according to the load rate and resource usage.Through experiments,it is proved that this method can reduce the amount of data transmitted across nodes,reduce the generation of non-local tasks,and improve locality.

Keywords/Search Tags:

Hadoop, Task scheduling, Resource allocation, Job classification, Load rate

PDF Full Text Request

Related items

1	Research On Task Scheduling Based On Load Balancing And Task Overtime Rate
2	Design And Implementation Of YARN Resource Scheduling Strategy Optimization Method
3	Task Resource Allocation And Control System Based On Hadoop Design And Implementation
4	Research On Task Scheduling Algorithms Based On Pre-Release Resource List In Hadoop
5	Research And Application Of Real-time Scheduling Method Based On Resource Granularity
6	Research On Resource Allocation And Scheduling In Hadoop YARN
7	Research And Improvement Of Hadoop YARN Resource Allocation Mechanism
8	Research On Resourcescheduling And Load Balancingbased On Cloud Computing
9	Study On Hadoop Resource Scheduling Strategy Based On IaaS Cloud Platform
10	Research Of Task Partition And Resource Allocation Algorithms For Load Balance In Spark Computing Environment