Font Size: a A A

Researches On Scheduling Algorithms For Parallel Applications And Parallel Workloads In Cloud Datacenter Environments

Posted on:2015-12-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:K F DengFull Text:PDF
GTID:1108330509460959Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Attracted by elasticity, flexibility, and efficiency of the resource model offered by cloud computing, recent years have seen an increasing number of scientific applications being migrated to cloud datacenters. Since resources are provisioned on-demand and charged in a pay-as-you-go manner, cloud computing holds the promise for cost-efficient scientific computing. Moreover, the datacenters in clouds are connected by Internet,which greatly facilitates scientific collaboration and data sharing. However, due to challenges introduced by big data, resource virtualization, virtual machine diversification, and periodic billing, it is nontrivial to make use of such advantages for scientific computing.Resource provisioning and task scheduling are critical for efficient execution of scientific applications in cloud datacenters. Although existing work have partially address the issue, they are still inefficient for problems such as scheduling data-intensive scientific workflows in multi-datacenter collaborative cloud environments, satisfying multiple Qo S constraints by using different types of on-demand resources, and handling fluctuating workloads with varying performance requirements. To address the challenge, in this work we study the scheduling of parallel applications and workloads in cloud datacenters.The main contribution of our work is four-fold:(1) We propose a weighted K-means clustering based data and task scheduling algorithm for efficient execution of data-intensive scientific workflows in collaborative cloud environments constituted of multiple datacenters. In the proposed algorithm, three types of dependencies are defined to capture and measure the degree of correlation between input datasets and tasks in a workflow application. Based on the definition, a dependency matrix is constructed and partitioned in order for distributing the input datasets and tasks among different datacenters such that under the constraint of storage balancing, the amount of data transfer across different datacenters is minimized. After the distribution,we use schedule adjustment, task replication, and data pre-staging for further improvement. We implement the algorithm in Swin De W-C cloud workflow system and the experimental results show that our algorithm is much better than the original K-means-based data placement algorithm.(2) We present a K-cut based multi-level graph partitioning algorithm for efficient execution of data-intensive scientific workflows with fixed input datasets among multiple datacenters. We adopt a multi-level graph partitioning framework for achieving the goal.In the framework, we first contract the fixed input datasets and their consuming tasks in the same datacenter into a single fixed node in the corresponding workflow graph. Then,by using a heavy connectivity matching approach, we coarsen the workflow graph to a predefined scale through a level-by-level manner. By taking the fixed nodes as terminals,the coarsened workflow graph is directly partitioned into K parts such that the total edge weight of the cut set is minimized. Finally, the partitioned graph is projected back to the original workflow graph. During the projection, the partitioning result is refined and the computation load balancing is maintained. We implement our algorithm and five other related algorithms in Cloud Sim, and the experimental results demonstrate the effectiveness of the proposed algorithm.(3) We design a DAG scheduling algorithm by using the recently proposed biogeographybased optimization(BBO) technique. In the presented algorithm, instead of encoding both task mapping and ordering for a scheduling solution, we use a singleton structure that only encode the task mapping part, based on which the task ordering part is computed. As a result, the searching space is largely reduced in comparison with the integrated encoding scheme. Furthermore, instead of aggregating different performance metrics into the fitness function, we introduce a compare function to compute the fitness value indirectly,which offers a flexibility of adapting the algorithm to a wide range of optimization problems. In order to deal with the uncertainty of the type and number of resources, we use the maximum parallelism of a DAG application to initialise the resource pool required by the proposed algorithm. Since the equality of the initial population generated by a random approach is very poor, we take low complexity heuristic-based algorithms as baseline algorithms to obtain a rapid convergence. Experimental results show that the BBO-based DAG scheduling algorithm is better than other classic population-based algorithms such as genetic algorithm, ant colony optimization, particle swarm optimization, and chemical reaction optimization.(4) We propose a algorithm portfolio based scheduling algorithm for long-term execution of scientific workloads in cloud datacenters. In the proposed portfolio scheduler,dozens of scheduling algorithms that are capable to cope with different types of workload patterns are included in the algorithm portfolio. To facilitate the trade-off between workload performance and execution cost, a utility function is designed to estimate the goodness of the constituent algorithms in the portfolio. We adopt an online simulation approach for selecting an appropriate algorithm to schedule the current workload. Since there too many algorithms in the portfolio, we develop a time-constrained simulation algorithm which can enlarge the probability of finding the best scheduling algorithm. The basic idea of the simulation algorithm is to cluster the constituent algorithms based on their recent performance and allocate time quotas to different clusters, such that well-performed algorithms have larger probability to be simulated. We implement the portfolio scheduler by extending the DGSim software, and the evaluation shows that the portfolio scheduler can obtain better result than all consistent algorithms.To sum up, in this work we propose four algorithms to address several critical issues while scheduling parallel applications and workloads in cloud datacenter environments.We demonstrate the effectiveness of the proposed algorithms by using standard benchmark applications and workloads, and by comparing with other state-the-art algorithms.The contribution of this work has theoretical significance in pushing forward the study of scheduling algorithms, and hence finds its practical value in making progress for scientific computing.
Keywords/Search Tags:Cloud Computing, Datacenter, DAG Application, Scientific Workflow, Parallel Workload, Resource Management, Task Scheduling
PDF Full Text Request
Related items