Font Size: a A A

A Study Of Task Scheduling Algorithm And Fault-Tolerant Poblem In Optical Network Based Distributed Computing System

Posted on:2009-02-16Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y SunFull Text:PDF
GTID:2178360242976816Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
In this paper, we introduce optical network based distributed computing system, which is able to provide services for many advanced applications such as e-science applications, collaborative design, virtual reality, etc. The optical network based distributed computing system interconnects the distributed computing resources with optical network which dynamically provides high bandwidth connection between computing resources with low latency.Among the challenges the optical network based distributed computing system faces, we study the task scheduling problem and fault-tolerance problem in this paper. To improve the system performance and execute the applications that users submit, the optical network based distributed computing system needs an efficient task scheduling algorithm. We first presents an extended list scheduling algorithm which extends the classic list scheduling algorithm taking into account the routing and wavelength assignment for data communications. Then we present a scheduled critical path based algorithm. This algorithm is able to schedule tasks according to the actual execution cost of tasks. We construct a simulator to perform the experiment and evaluate the performance of proposed algorithms. We compare the performance of different algorithms on a simple system with the optimal results calculated by OPL studio software. We also compare the performance of different algorithms on more complex systems. All the experiments prove that the SCP algorithm is able to calculate a better schedule result.We also address to the fault-tolerant problem of optical network based distributed computing system. We first propose a policy to achieve fault-tolerance by rescheduling the application when a fault happens to the system. We evaluate the performance of rescheduling policy and the experiment results prove the rescheduling policy is able to achieve better performance when there are many faults in the system. However, for some applications with real-time requirement, the rescheduling policy can't provide the guaranteed finish time. For the optical network based distributed computing system, the guaranteed finish time is used to judge whether to accept an application with real-time requirement. Therefore, we introduce backup resources for tasks and data communications. According to the fault model, we schedule the tasks and data communications on both the prime and backup resources. We propose two types of backup policies: the overlay policy and the joint policy. The overlay policy directly applies the existing fault-tolerant policy for computing resources and network resources. The joint policy tries to consider the backup of computing resource and the backup of network resources in the same time, to avoid some unnecessary resource redundancy. The simulation results prove the joint policy is able to achieve the better performance.
Keywords/Search Tags:Distributed computing system, optical network, task scheduling, fault-tolerant policy
PDF Full Text Request
Related items