Font Size: a A A

Research And Design Of Real-time Performance Of Job Scheduling Based On Hadoop Cluster

Posted on:2020-03-21Degree:MasterType:Thesis
Country:ChinaCandidate:B Y DongFull Text:PDF
GTID:2428330572481088Subject:Engineering
Abstract/Summary:PDF Full Text Request
Hadoop platform is the most mainstream distributed cloud computing platform at present.Job scheduling technology is the key technology in Hadoop,which has a direct impact on the performance of Hadoop platform and the resource utilization of the system.Therefore,it is of great significance to study job scheduling under Hadoop platform.With the development of all industry,the demand of users is becoming more and more diverse,and the application of hybrid task set with time constraints is becoming more and more popular.Among them,users pay more attention to the system with real-time performance.To improve the real-time performance of Hadoop platform by designing a hybrid task set job scheduler is one of the most hotspot in this field.The existing problem is that Hadoop's existing job scheduler considers less real-time performance,only unit groups are considered for job queue scheduling,and the applicability of multi-task types is not considered enough.In order to solve the above problems,a new hybrid task set job scheduling strategy is proposed in this paper.Job scheduling adopts dynamic priority scheduling method.In the dynamic priority scheduling algorithm,the scheduling decision unit group is extended to four tuples,which are job emergency degree,job waiting time,job task value and job expected completion time,so as to improve the overall scheduling performance of Hadoop platform and real-time performance.In terms of implementation technology,by creating a new job scheduler,the new job scheduler inherits the Task Scheduler interface and loads and invokes the new job scheduler in Resource Manager.In Hadoop,the default job priority has only five levels,which can not clearly reflect the urgency degree of the job.Therefore,this paper also gives a proportional mapping formula of priority.Through this formula,the default job priority can be expanded to any integer,which can better reflect the urgency degree of the job when the number of jobs is large.The new job priority is calculated by the new job scheduling algorithm,and the job with the highest priority is selected to allocate system resources.With the passage of time,the priority of the job will change dynamically.Every job ends,the job queue is traversed once to ensure the priority of the job in the queue is executed first.This paper builds a Hadoop cluster for experiment,compares static priority scheduling algorithm considering job emergency degree and default scheduling algorithm FIFO based on Hadoop platform with dynamic priority scheduling algorithm,and concludes that dynamic priority scheduling algorithm is more flexible,shorten the average waiting time of jobs in the queue,and achieve a better fit.The scheduling of job queues improves the utilization rate of system resources.Through the weight setting experiment,it is verified that users can dynamically set the weight of parameters according to their actual needs.Different weight setting will achieve different job queue sorting.It is concluded that the new scheduling algorithm is closer to the diversified needs of users and meets the needs of all kinds of users.
Keywords/Search Tags:Hadoop cluster, job scheduling algorithm, dynamic priority, real-time performance, Priority equiscaled mapping
PDF Full Text Request
Related items