Font Size: a A A

Distributed File System Level Fault-tolerant Mechanism

Posted on:2006-07-05Degree:MasterType:Thesis
Country:ChinaCandidate:Z WangFull Text:PDF
GTID:2208360152998483Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
File System is one of the linchpin upon which a successful Operating System depends. It manages all kinds of resources in the system and performs function to store programs and data. The DFS (Distributed and Parallel File System) not only retain the services supported by traditional File System, but also processes most virtues of Distributed and Parallel System, such as resources sharing, high reliability, high usability, high throughput, large storage capability, and so forth. In recent years, DFS has become one of the research hotspots in domestic and overseas study. DPFS is a Distributed and Parallel File System developed by 8010 research lab. With the Distributed Schedule Mechanism and Distributed DB system, they make up of the DPLinux (Distributed and Parallel Linux Operating System). Through a series of user interfaces, the users can easily access the DPFS without considering the details of its realization. Based on Linux kernel, DPFS is designed high compatibility, processing ability and good DFS performance. As a basic guarantee mechanism of the reliability and stability of the system, FTM (Fault Tolerant Mechanism) is a critical issue in DPFS. When some failures happen, the DPFS can recover to a consistent state and continue its service with the help of FTM. Moreover, a node in DPFS also requires the FTM to detect the errors when it is running, by which it can check the current state. The FTM can be divided into three sub-systems, which are operating functions detecting and self-recovering sub-system, remote real-time fault-tolerant sub-system and log-based fault-tolerant sub-system. The operating function detecting mechanism is used to monitor and diagnose the errors taking place in the system and saves the information of them. The self-recovering mechanism provides a method to solve the system failures at local-site in time. The remote real-time fault-tolerant sub-system tries to recover the remote sites from the errors by sending the correct operations to them through the network and executing the operations on them. The log-based fault-tolerant sub-system is designed to recover the permanent failures that can't be solved momentarily. The three sub-systems comprise the parallel multi-level fault-tolerant function model of DPFS. Considering the disadvantage of traditional fault-tolerant techniques, we propose the synchronizing operating functions fault-tolerant, error management module, an agent-based dynamic recovery protocol, etc. Therefore, system failure rate is cut down and the overall performance gets improved.
Keywords/Search Tags:Fault diagnosing, Local Recovery, Remote Real-time Fault-Tolerant, Log-based fault-tolerant, Dynamic Recovery Protocol, Distributed and Parallel File Systems
PDF Full Text Request
Related items