Font Size: a A A

Research On Data-intensive Scientific Workflow Scheduling Algorithm Using Partitioning Reinforcement Learning Method

Posted on:2022-01-09Degree:MasterType:Thesis
Country:ChinaCandidate:Q G ZhangFull Text:PDF
GTID:2518306317989499Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Scientific workflow scheduling is the core content of solving scientific research problems under the cloud computing platform,which is widely used in bioinformatics,astronomy and other fields.Scientific workflow scheduling aims to discover task priority rules and resource allocation strategies that can obtain optimal scheduling solutions.Deep reinforcement learning methods are mostly used to solve multi-objective optimization problems,and they are better in solving online scheduling problems compared to traditional heuristic algorithms.Existing scheduling algorithms ignore the impact of data dependence between tasks on the scheduling results,resulting in poor performance of schedulers in processing data-intensive workflows.To solve this problem,this paper proposes a data-intensive scientific workflow scheduling algorithm based on graph segmentation reinforcement learning.In order to reduce the amount of data transmission between different data centers with data dependence characteristics,this paper uses the method of graph segmentation to transform the original workflow into a secondary workflow.The first step is to set the block capacity parameter based on the critical path to constrain the block size;secondly each block's absorbing tasks according to the expected data volume and block capacity parameters in turn until the export task;finally adjust each block using the polygon rule to obtain a workflow with block number.Tasks with strong data dependence will be grouped into the same block.For solving the problem of data-intensive scientific workflow scheduling,this paper designs a partitioning reinforcement learning-based workflow scheduler(PRLS),using resource clusters and workflow queues as the environment,and the scheduler as an agent to build Markov model;then,the block number,block output rate,task completion rate,resource performance,etc.are used as state set parameters,and the cerebellar model articulation controller(CMAC)is used as the value network to reduce the dimension of the state vector and getting the state value.Set the reward function according to the user's preference Pareto vector,and use the task-node as the action set;use the SARSA algorithm to iteratively schedule the workflow,and finally get the scheduling strategy that meets the user's expectations.The simulation experiment results show that the algorithm in this paper can optimize the scheduling from two perspectives of makespan and cost for data-intensive scientific workflow problems,and obtain the expected scheduling results.
Keywords/Search Tags:scientific workflow, cloud computing, reinforcement learning, graph segmentation
PDF Full Text Request
Related items