Research On Data-intensive Scientific Workflow Scheduling Algorithm Using Partitioning Reinforcement Learning Method

Posted on:2022-01-09

Degree:Master

Type:Thesis

Country:China

Candidate:Q G Zhang

Full Text:PDF

GTID:2518306317989499

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Scientific workflow scheduling is the core content of solving scientific research problems under the cloud computing platform,which is widely used in bioinformatics,astronomy and other fields.Scientific workflow scheduling aims to discover task priority rules and resource allocation strategies that can obtain optimal scheduling solutions.Deep reinforcement learning methods are mostly used to solve multi-objective optimization problems,and they are better in solving online scheduling problems compared to traditional heuristic algorithms.Existing scheduling algorithms ignore the impact of data dependence between tasks on the scheduling results,resulting in poor performance of schedulers in processing data-intensive workflows.To solve this problem,this paper proposes a data-intensive scientific workflow scheduling algorithm based on graph segmentation reinforcement learning.In order to reduce the amount of data transmission between different data centers with data dependence characteristics,this paper uses the method of graph segmentation to transform the original workflow into a secondary workflow.The first step is to set the block capacity parameter based on the critical path to constrain the block size;secondly each block's absorbing tasks according to the expected data volume and block capacity parameters in turn until the export task;finally adjust each block using the polygon rule to obtain a workflow with block number.Tasks with strong data dependence will be grouped into the same block.For solving the problem of data-intensive scientific workflow scheduling,this paper designs a partitioning reinforcement learning-based workflow scheduler(PRLS),using resource clusters and workflow queues as the environment,and the scheduler as an agent to build Markov model;then,the block number,block output rate,task completion rate,resource performance,etc.are used as state set parameters,and the cerebellar model articulation controller(CMAC)is used as the value network to reduce the dimension of the state vector and getting the state value.Set the reward function according to the user's preference Pareto vector,and use the task-node as the action set;use the SARSA algorithm to iteratively schedule the workflow,and finally get the scheduling strategy that meets the user's expectations.The simulation experiment results show that the algorithm in this paper can optimize the scheduling from two perspectives of makespan and cost for data-intensive scientific workflow problems,and obtain the expected scheduling results.

Keywords/Search Tags:

scientific workflow, cloud computing, reinforcement learning, graph segmentation

PDF Full Text Request

Related items

1	Makespan And Cost Optimization In Scientific Workflow Scheduling In Cloud Computing Environment
2	Research On Scientific Workflow Scheduling Method Based On Multi-object In Cloud Computing Environment
3	The Research On Scheduling Strategy Of Scientific Workflow In Cloud Computing Environment
4	Research On The Execution Optimization Of Scientific Workflow In Cloud
5	Research On Scientific Workflow Data Layout Strategy For Cloud Environment
6	Efficient scientific workflow scheduling in cloud environment
7	Research On Key Techniques Of Scientific Workflows In IaaS Environment
8	Research Of Replication And Placement Strategies For The Intermediate Data Of Scientific Workflow In Cloud
9	Research On Data Placement Strategy For Scientific Workflow In Cloud
10	Research On Cost Based Scientific Workflow Scheduling In Cloud Computing Environment