Font Size: a A A

Research On The Real-Time Fault-Tolerant Scheduling Algorithms For Distributed Systems

Posted on:2009-09-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:W LuoFull Text:PDF
GTID:1118360278462363Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of high speed network and high performance computers, distributed systems have been widely applied for critical real-time systems, in which correctness of systems depends not only on the results of a computation, but also on the time instant at which the results are available. If the real-time systems can not produce the correct results before deadlines, it can cause catastrophic results. As such, several fault-tolerance mechanisms must be integrated to guarantee timing constrains of real-time tasks even in presence of failures.Real-time fault-tolerant scheduling plays a key role in implementing the fault-tolerantce in real-time distributed systems. Aiming the characteristics of the modern real-time distributed systems, the real-time fault-tolerant scheduling algorithms based on homogeneous as well as heterogeneous distribted systems are designed to meat many requirements targets. The goal of this dissertation's research is to satisfy diverse performance metrics of real-time applications, including schedulability, reliability and the acceptance ratios by high-quality scheduling algorithms.Modern real-time distributed systems are always applied to the systems with strong time and spaces contraints, such as the control systems in secondary planet. Increasing the number of processors leads to more spaces and energy consumptions. Therefore, one major research interests in this thesis is to reseach the schedulability in homogeneous real-time distributed systems, that is to try to reduce the redundancies without jeopardizing the fault-tolerance abilityTraditional FTRMFF leverage both active backup copy and passive backup schemes to achieve fault-tolerance and adopt the first-fit strategy to try to reduce the worst case response time(referrrd as WCRT), which make as many backup copies as possible to be executed as passive wasy in order to reduce the system recundancies. Practical examples reveal that the FTRMFF will exhibit considerable redundancies if the WCRTs of primary copies are larger that those of backup copies. To remedy this deficiency, the terminates the execution of active backup copies (referred as Tercos) algorithm is proposed. Tercos terminates the execution of active backup copies when corresponding primary copies are successfully completed. Tercos aims at reducing scheduling lengths in fault-free scenario to enhance schedulability by virtue of executing portions of active backup copies in passive forms.The second algorithm (referred to as Debus) uses a deferred active backup scheme to further minimize schedule lengths to improve the schedulability performance. Debus schedules active backup copies as late as possible while terminating active backup copies when their primary copies are completed. Experimental results show that compared with existing algorithms in literature, Tercos can significantly improve schedulability over FTRMFF. Furthermore, empirical results reveal that Debus can enhance schedulability over Tercos when the WCRT of primary copies are nearly equl to the WCRT of active backup copies.The homogeneous distributed systems require that each node in the distributed systems should be identical. However, this is very hard to be implemented in practical. Therefore, heterogeneous distributed systems are more widely adopted in practical. Different from homogeneous distributed systems, the most important feature of heterogeneous distributed systems is their great variety of the computing power and the reliabilities of different processors. So reliability issues should be taken into account when designing algorithms for heterogeneous distributed systems. Most existing real-time fault-tolerant scheduling algorithms, based on heterogeneous distributed systems, either only support non-preemptive, aperiodic tasks or only consider one type of backup-copy. To address these problems, in this thesis we propose two novel reliability models based on preemptive periodic tasks. Moreover, three real-time fault-tolerant scheduling algorithms based heterogeneous distributed systems, namely NRFTAHS and RDFTAHS which are based on the first reliability model and IRDFTAHS which are based on the second reliability model, are presented. The first reliability model defines the system reliability which does not consider the reliability after fault occurs. NRFTAHS tries to assign tasks with the aim of improving schedulabilty of the systems, while RDFTAHS with the aim of improving reliability of system. The second reliability modle, however, fully consider the system reliability when fault occurs. Thereore, IRDFTAHS is flexible and practical than existing algorithms. Finally, simulation experiments are carried out to compare the three algorithms in several aspects, the experiments results show that the RDFTAHS performs significantly better than NRFTAHS with respect to overall performance and IRDFTAHS is superior to RDFTAHS.Different from static algorithm, dynamic algorithm is more complex because ti has to adjust its scheduling strategy according to current situation. We propose a dynamic and reliability-driven real-time fault-tolerant scheduling algorithm on heterogeneous distributed systems (DYFARS). DYFARS employs reliability costs as its main objective to dynamically schedule independent, non-preemptive aperiodic tasks, therefore system reliability is enhanced without additional hardware costs. In addition, scheduling time and dispatch time are both incorporated into our scheduling scheme so as to make scheduling results more realistic and precise.
Keywords/Search Tags:Distributed systems, Real-Time scheduling, Fault-tolerance, Primary/Backup copy, dynamic priority scheduling
PDF Full Text Request
Related items