Font Size: a A A

Research And Implementation Of Job Runtime Prediction And Job Scheduling Based On High-performance Computing Job Log

Posted on:2022-01-10Degree:MasterType:Thesis
Country:ChinaCandidate:X M ChenFull Text:PDF
GTID:2518306491996899Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In the high-performance computing environment,computing resources are very important and in short supply.The performance is obvious in the execution of computing jobs related to computational fluid dynamics.In the actual production environment,high-performance computing resources are often unable to meet the needs of computing tasks due to power limitations,the inability to interconnect computing nodes,and the high cost of resource consumption.As a result,the task execution time in the current queue that occupies a large amount of resources is too long,causing subsequent tasks to be in a waiting state for a long time,or the scale of task is too large causing that subsequent tasks cannot obtain free resources to support task calculation.In order to improve the utilization of high-performance computing resources and the efficiency of solving problems for researchers in the actual engineering environment,this paper studies the job runtime prediction algorithm with the high-performance computing job log data set.The integrated machine learning algorithm LightGBM is used for the first time for job runtime prediction and scientific computing workflow execution time prediction,and three scheduling strategies are proposed based on the prediction.According to the experimental results,the priority execution strategy with shorter execution time and the greedy algorithm are selected to pre-schedule computing tasks,which effectively reduces the overall execution time and waiting time of jobs and workflows,and is beneficial to release computing resources as soon as possible.The main research contents are as follows:(1)Use the historical job log data set of the high-performance computing system to study the prediction of the execution time of high-performance computing jobs by different machine learning algorithms,and combine the requirements of the engineering environment to improve the utilization of computing resources,analyze the main process of solving computational fluid dynamics related problems,use directed graphs to describe the process,and describe the corresponding scientific computing workflow model based on the XML lightweight process description language,and extract effective task node information and data resources file.Based on the performance of the prediction algorithm and the computational job log data of computational fluid dynamics,the prediction algorithm model is used to predict the actual computational fluid dynamics job runtime and workflow execution time;(2)On the basis of predicting the runtime of the job,three scheduling strategies are proposed,which are the priority execution of the jobs with shorter runtime,jobs with a larger application resource value,and jobs with a larger ratio of the applied resource value to the job runtime.According to the comparative experiments under different scheduling strategies,the strategy that has the better performance is selected to perform pre-scheduling of jobs and workflows,which is the priority execution of the jobs with shorter runtime;(3)Taking advantage of the fact that simulation does not occupy actual computing resources and is effective,the WorkflowSim and GridSim simulation platforms are used to simulate linear scientific computing workflow task scheduling and nonlinear scientific computing workflow task scheduling respectively.The experimental results show that for linear scientific computing workflow task scheduling,it is more direct and efficient to use the first come first serve algorithm according to the dependency.For non-linear scientific computing workflow task scheduling,the use of job execution time prediction algorithms and greedy algorithms,and backfill scheduling tasks based on the actual number of resources can effectively help reduce workflow execution costs,improve workflow execution efficiency,and increase computing resources utilization rate.
Keywords/Search Tags:High performance computing, Job runtime prediction, Computational fluid dynamics, Scientific computing workflow, Scheduling strategy
PDF Full Text Request
Related items