Font Size: a A A

Research On Hadoop Scheduling Algorithm Based On Dynamic Resource Allocation In Heterogeneous Environment

Posted on:2022-12-23Degree:MasterType:Thesis
Country:ChinaCandidate:J Y LinFull Text:PDF
GTID:2518306773967959Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
Hadoop is a distributed framework for processing big data in an efficient way,and is widely used in million-level data processing with high reliability,fault tolerance and scalability.Task scheduling,as a key technology of Hadoop,is mainly concerned with the problem of assigning computational tasks to specified nodes,and is of very significant research significance in terms of improving resource utilization,shortening task execution time and increasing throughput rate.In this paper,we investigate the resource allocation problem in Hadoop environment at the job level by proposing a dynamic scheduling strategy based on job classification and a load-based locality scheduling strategy for some important but not yet effectively solved problems,such as cluster heterogeneity,job locality and real-time.Finally,experiments are conducted to verify the effectiveness of the proposed method.The specific research work is as follows:A job classification-based scheduling strategy is proposed to address the heterogeneity of clusters and the real-time problem of jobs.First,two listeners are set up in the initial stage of the cluster to listen to job arrival messages and resource idle messages respectively.Then,the listeners process the messages according to the different message types and save the cluster resource usage for resource allocation.Finally,when the listener receives a job arrival message,it classifies the jobs according to their estimated completion time and job arrival rate,and assigns the classified jobs to the corresponding queues for scheduling.The experimental results show that this method can effectively reduce job execution time and CPU spending time,and make gains in improving the high performance of the cluster.To address the problem of data locality,a node load-based locality scheduling strategy is proposed.First,the concept of load ratio is proposed,and the load ratio indicates the cluster resource usage,and the remaining resources are calculated based on the load ratio.Then,for each job queue,the tasks in the queue are traversed,and if the task is a local task of the current node,the task is scheduled first.If the task is a non-local task,the information of the task is saved and the iterative process continues to the next task.Finally,when all local tasks are scheduled,the non-local tasks are assigned to nodes with lower load for execution according to the load rate and resource usage.Through experiments,it is proved that this method can reduce the amount of data transmitted across nodes,reduce the generation of non-local tasks,and improve locality.
Keywords/Search Tags:Hadoop, Task scheduling, Resource allocation, Job classification, Load rate
PDF Full Text Request
Related items