Font Size: a A A

Design Of Workflow Scheduling Algorithms Under Throughput And Budget Constraints In Multi-Cloud Environments

Posted on:2020-03-07Degree:MasterType:Thesis
Country:ChinaCandidate:R X LiFull Text:PDF
GTID:2428330590481866Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
With the rapid development and deployment of cloud computing infrastructures,many applications in various scientific domains are increasingly utilizing cloud resources for big data storage and analysis.However,it has become a significant challenge to manage and execute big data scientific workflows in multi-cloud environments to process streaming datasets.Computing module in scientific workflow generally consists of a series of computing tasks such as data generation,processing and analysis,meanwhile streaming workflows continuously produce a large quantity of experimental or simulation datasets,which need to be processed in a timely manner subject to certain performance and resource constraints.To optimize various objectives and improve scalability,data-and network-intensive scientific workflows have been increasingly deployed in multi-cloud environments,thus facing challenges in reducing the cost of data transmission between clouds.In this thesis,we formulate scheduling problems with two different objectives in multi-cloud environments,namely,maximize the throughput of streaming workflows under a budget constraint(MaxStream-MC)or minimize the execution cost of streaming workflows under a throughput constraint(MinStream-MC).This thesis consists of the following technical components:(1)We adopt a three-layer workflow architecture to perform inter-cloud and intra-cloud workflow scheduling:(i)the top layer defines a workflow structure comprised of various computing modules with inter-module data transfer and execution dependency;(ii)the middle layer defines a cloud-based network of VM instances provisioned on the physical machines(PMs)located in different data centers;(iii)the bottom layer defines a number of distributed data centers,which are organized as clusters of PMs and connected via high-speed networks.Based on this architecture,we construct rigorous mathematical models to formulate the workflow scheduling problems in multi-cloud environments and analyze their computational complexity.(2)For the formulated problems,we propose two heuristic algorithms,namely,budget-constrained workflow mapping to achieve the maximum throughput of a streaming workflow in a multi-cloud environment(B-StreamWS)and throughput-constrainedworkflow mapping to minimize the execution cost of a streaming workflow in a multi-cloud environment(FR-StreamWS).The key steps include:(i)sort modules in the workflow to determine the scheduling order,(ii)assign virtual machines,and(iii)select physical machines and links.(3)The proposed heuristic algorithms are simulated at different workflow and cloud scales,and compared with existing algorithms.The experimental results show that B-StreamWS achieves 31%,25%,46% and 28% throughput improvement compared with B-RATE,MCWM,Critical-Greedy(CG)and Greedy LDP algorithms,and FR-StreamWS achieves36% and 23 % cost reduction compared with TP-RATE and SC-PCP under the same throughput constraint.
Keywords/Search Tags:Scientific workflow, workflow mapping, cloud computing, throughput
PDF Full Text Request
Related items