Font Size: a A A

The Design And Implementation Of Fault Tolerance System In The Distributed Storage System

Posted on:2009-01-18Degree:MasterType:Thesis
Country:ChinaCandidate:W N WangFull Text:PDF
GTID:2178360308977785Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the rapid increase of data and files, the distributed storage system was widely applied. The reliability is one of the most important part of the distributed storage system. In order to realize the reliability of the system, we must take many kinds of technologies to realize the system reliability.CFSEA is Chinese search engine of Agriculture that is developed by the information searching lab of NEU. It realized the distributed crawling and storage of information. The distributed storage system is the foundation module of the CFSEA.The task of the fault tolerance module is to accomplish the fault detection and to recover system from fault when any fault occurs in the SubNode. So the fault tolerance module can ensure the stability and reliability of CFSEA.The system architecture of the distributed storage of CFSEA is discussed firstly in the paper. Then we emphasize on the designation and implementation of the fault tolerance module. The fault tolerance module takes some technologies about fault tolerance that is more suitable for the CFSEA after comparing and analysis many of fault tolerance technologies.There are four sub modules in the fault tolerance system, the fault detection sub module, the self-recovery of operation sub module, copy sub module and fault tolerance sub module using log and check point. The task of fault detection is to check the correction of the data and SubNode. The task of self-recovery is supporting the simple recovery of operation. The function of data replication is supporting correct data when a SubNode which contains the data is down. The task of fault tolerance using log and check point is to recover the ManageNode when it is down.The fault tolerance module ensures reliability of the distributed system while considering the system performance.
Keywords/Search Tags:distributed storage system, fault detection, fault tolerance, copy, log, check point
PDF Full Text Request
Related items