Font Size: a A A

Research On Data Fault-Tolerance Of Parallel File System

Posted on:2005-02-05Degree:MasterType:Thesis
Country:ChinaCandidate:J J YangFull Text:PDF
GTID:2168360152969187Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the coming forth of the high-speed network technology and high performance workstation in resent years, cluster havs become an idea platform for parallel computing.With the increase of the cluster scale, the probability of system fault is increasing. In the cluster environment failure is very harmful for the long time running application. In this circumstance fault-tolerance is very important. Fault-tolerance can make the programmed continue running in spite of the failure of some nodes of the cluster. Parallel file sysem is a vital component of the cluster. by employ parallel file system cluster can use the local disk space of ervery node to get the parallel I/O, instead of using expensive hardware RAID. Parallel file system is a vital for cluster, so the research on the fault-tolerance of the parallel file system is very important.There are two main mothed for implementing the parallel file system fault-tolerance. They are software RAID and data-replication. The two mothed cannot Backup the load-balance of the cluster system. Based on the PVFS. Employ a new data-replication method to implement the fault-tolerance of parallel file system. In the mothed, it first distributed the file on a group of I/O nodes, then duplicate ervery sub-fule of the same file on other node. The meta-data manager method is very agility. The user application can decide how many replications he wants. So difference file have difference reliability.The data-place policy can make the replicated data of the files on one I/O nodes is de-clustering across all I/O nodes in the cluster. This benefit workload balance and scalability of the cluster. If on I/O node failure, the workload of the I/O node is separated on the whole cluster. When the I/O node come back. The Recovery workload also is separated on the whole cluster. So this replication schema is benefit workload balance comparing with other replication method and make the cluster have high scalibaliby.
Keywords/Search Tags:Fault-tolerance, consistent, data replication, data-stripe, parallel file system
PDF Full Text Request
Related items