Font Size: a A A

Research On Flexible Task Scheduling In Large-scale Data Parallel Processing Applications

Posted on:2020-10-12Degree:MasterType:Thesis
Country:ChinaCandidate:C Q HuangFull Text:PDF
GTID:2438330572987383Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Hadoop,distributed computing framework effectively solves the problem of data storage and computing in large-scale parallel data processing.Cluster resources allocation strategy and scheduling mode have great practical significance for improving the overall performance of the system.At present,the relevant researchers propose many kinds of heuristic algorithms in improving data localization,job completion time and system throughput,so as to improve the overall system performance.But most of the heuristic algorithms use greedy strategy to scheduling tasks and lack of holistic planning for tasks.Because data is distubuted storage in the cluster,in the shuffle phase it will eventually result in network congestion so that jobs cannot be completed quickly.With the rapid growth of data,it is a new challenge to reasonable resouce allocation strategy and release the bandwidth of the top layer.In addition,in real life,some users have deadline requirements for jobs.Existing algorithms don't take into account the different benefits and jobs sensitivity to deadline.This paper designs different scheduling algorithms to solve the above two problems,focusing on:1.Resources allocation strategy;2.Job scheduling method.The above two aspects directly affect the overall performance of the platform and the utilization of system resources.In real life,there are many recurrent jobs with predictable attributes,the job execution time can be predicted by establishing performance model.We find that the cluster job scheduling problem and the two-dimensional rectangular packing problem have many similarities.This thesis transforms the cluster resources scheduling problem into a variable rectangular packing problem,and designs a Flexible Job Bin Packing algorithm(FJBP for short).In order to find a better solution,this paper combines genetic algorithm to further optimize the solution.In view of different sensitivity to deadline of jobs,we classifie jobs according to the sentivity firstly,then design an elasticity and Deadline Aware job scheduling algorithm(DA for short),which account for both jobs emergency and jobs expected benefit.We develop a simulater via matlab to verify the advantages of algorithms.FJBP algorithm decreases the overall completion time of jobs,increases the utilization rate of system resources,releases the bandwidth of the top layer and reduces network congestion to a certain extent.DA algorithm measures the comprehensive benefit before the execution of jobs,which gives priority to jobs with large benefits.The experiment shows that the overall benefit increases by 2.37 times on average.
Keywords/Search Tags:Big data, Parallel computing, Flexible job, Scheduler algorithms, Variable rectangular packing, Deadline aware
PDF Full Text Request
Related items