Font Size: a A A

Research And Implementation Of Job Scheduling Algorithm In Cloud Environment

Posted on:2022-04-22Degree:MasterType:Thesis
Country:ChinaCandidate:Y HuangFull Text:PDF
GTID:2518306605468084Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Along with the rise of the mobile Internet,Cloud Computing,artificial intelligence and other new technologies,big data analytics has become increasingly vital in many modern enterprise.Today's big data processing systems,such as Hadoop Map Reduce,Spark,and Flink,treat big data applications as a batch of jobs for scheduling.In the existing big data processing system,the logic of data processing is highly complex,and the big data jobs processed are also dynamic.The existing job scheduler usually cannot take into account the high performance of the scheduling algorithm and the efficient use of cluster resources at the same time.How to design a good scheduler to reduce the jobs' average turnaround time and makespan during job scheduling,while achieving efficient and stable resource use,is a major challenge in both academia and industry.To solve the above problems,this paper proposes a job scheduling algorithm for big data processing system in cloud environment based on job completion time prediction model and scheduling gain model.First,the evaluation indicators of the job scheduling problem is analyzed and modeled.Then,based on the performance model,we give a formal definition of the job scheduling problem of the big data processing system in the cloud environment.The goal of this problem is to reduce the jobs' average turnaround time and the makespan under the constraints of the user's resource usage range.This paper compares the proposed scheduling algorithm with other four classic job scheduling algorithms and draws a conclusion.The specific research content includes the following aspects:(1)Based on the big data processing system in the cloud environment,modeling the job scheduling problem of the system.Specifically,the system is modeled from three aspects:the job scheduling process,scheduling algorithm performance indicators and cluster resource utilization indicators,coveringvarious factors such as jobs' average turnaround time,makespan and resource utilization involved in the job scheduling algorithm.(2)In view of the relationship between different job types and cluster resource utilization,a job completion time prediction model is established to predict the completion time of jobs under different cluster resource utilization.Given a single job,performing independent execution under different resource utilization environments,measuring and recording completion time,and then fitting through regression technology,the job's completion time under different resource utilization conditions can be predicted.(3)Designed and implemented a performance-aware scheduling algorithm(PAS).In each scheduling interval,the greedy algorithm and the One-step Lookahead algorithm are used to increase the scheduling gain of a single job scheduling process,so as to achieve the purpose of reducing the jobs' average turnaround time and the makespan.This paper uses the open source big data job management system Hadoop YARN which is commonly used in the industry to verify the performance of the job scheduling algorithm(PAS),and compares the PAS algorithm with the other four scheduling algorithms,AHP ? SJF ? FIFO and DRF.The experimental results show that the proposed PAS algorithm is more effective than the four comparison algorithms,the average job turnaround time reach 42.08% performance improvement at most,the makespan reach 20.9% performance improvement at most,and the resource utilization is more stable and efficient,reaching the optimization goal.
Keywords/Search Tags:big data, cloud computing, job scheduling, time prediction
PDF Full Text Request
Related items