Font Size: a A A

Reliability Oriented Models And Algorithms Of Grid Workflow Scheduling

Posted on:2012-04-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:X LiFull Text:PDF
GTID:1118330374987514Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development and applications of Grid technology, research focus is shifted from implementing core components and basic functions of Grid systems to investigating how to ensure Grid to provide non-trivial Quality of Service (QoS). Reliability is one of important QoS metrics. With the extension and exploration of application fields in Grid, the problem of Grid reliability becomes a key factor which restricts extensive application of Grid technology. However, it is very hard to provide complete reliability due to complexity, dynamism, and heterogeneity of Grid systems themselves. Recently, researchers not only study reliability modeling, evaluation, and analysis techniques, but also integrate reliability consideration into Grid architecture, resource management, and task scheduling. Grid workflow is becoming the typical paradigm of designing large scale scientific applications in Grid environment. Workflow scheduling is one of the key issues, which directly affects the successful and efficient execution of Grid workflows. Therefore, reliability oriented Grid workflow scheduling is an important issue with theoretical and practical value.The characteristics of Grid such as dynamism and autonomy bring up great challenge for successfully and efficiently running workflows. In order to guarantee and improve the reliability and performance of tasks execution in Grid environment, through analyzing current reliability research results conducted in different layers of Grid systems, and aiming at difficulties and challenges in workflow scheduling, this thesis deeply investigates efficient and effective Grid workflow scheduling mechanism and explore the policies of enhancing workflow execution reliability. The main research content and contributions are as follows:(1) The M/M/N repairable queuing system based Grid resource reliability evaluation model and the corresponding dynamic Grid workflow scheduling model based on resource reliability evaluation are investigated and proposed.According to the fact that Grid resource could fail and recover from it, this thesis adopts M/M/N repairable queuing system to establish a comprehensive evaluation model, which is used to describe the reliability and dynamic processing capacity of Grid resource in multi-cluster environment. Some metrics such as stable availability, average queue length, and so on could be obtained by solving the model. Thus, the average queuing time of workflow tasks on each resource site can be calculated. This model considers not only the possible failures of Grid resource, but also the dynamic workloads of resource sites, which makes it be close to practical Grid systems. On the basis of the resource reliability evaluation model, a queuing time aware dynamic Grid workflow scheduling model and the corresponding algorithm is proposed. The QTADGWS algorithm uses the list based method, and tries to achieve great performance by overlapping data transfer time and queuing time. The simulation experimental results indicate that QTADGWS could achieve better performance on makespan and average waiting time.(2) The reliability workflow scheduling algorithm with deadline constraint is put forward.Aiming at multi-cluster Grid environment, this thesis utilizes Markov process to model the availability of these processers, and combines it with stochastic service model to describe dynamic workload and processing capacity of each Grid site. A concept named Deadline Satisfaction Degree of Workflow (DSDW) is put forward, and its calculation method is also provided based on availability and dynamic service model of resource. Based on the rule of maximizing DSDW, the deadline distribution issue is modeled as a non-linear programming problem with constraints which can be resolved with an interior point algorithm. A Deadline Satisfaction Enhanced Scheduling Algorithm for Workflow (DSESAW) is brought up finally. The simulation experimental results show that this algorithm achieves better performance on adaptation to dynamic Grid environment and user's deadline guarantee.(3) The self-adaptive workflow scheduling mechanism for OSG and the reliability enhanced policy are proposed.Aiming at realistic Grid environment-Open Science Grid (OSG), and based on workflow management system Swift, a multi-stage Grid workflow scheduling mechanism is designed, which involves Grid sites discovery, initial evaluation, and dynamic evaluation and selection. The performance prediction values based on time series are used to assess the initial performance of each site, and then a site selection algorithm based on self-adaptive scoring mechanism of Grid sites is proposed. In order to improve reliability of workflow execution and potentially reduce execution time, an incremental task replication policy based on empirical CDF of queuing time of sites is introduced. The experiments done in OSG prove that these algorithms and policy can effectively decrease JRR and JSL. Through integrating the proposed multi-stage workflow scheduling mechanism and the optimization policy into Swift, the number of tasks within the workflow which could be completed successfully in OSG increases from several hundred to four thousand.
Keywords/Search Tags:Grid, reliability, workflow, failure, scheduling algorithm
PDF Full Text Request
Related items