Font Size: a A A

A software system for user application tolerance of network and computing node failures

Posted on:2003-04-20Degree:M.SType:Thesis
University:University of Houston-Clear LakeCandidate:Myers, Byron JamesFull Text:PDF
GTID:2468390011488738Subject:Computer Science
Abstract/Summary:
This thesis presents a software system that enables user applications to tolerate network and computing node disasters. This software system is referred to as a Disaster Recovery Cluster Software System (DRCSS). User applications are written utilizing a DRCSS Application Programming Interface. An active DRCSS initiates parallel instances of a user application over one or more autonomous computing nodes. These nodes may be dispersed across local area networks (LANs) and wide area networks. The computations of only one application instance, called the primary instance, will be used by all parallel instances to synchronize their internal variables. The DRCSS reliably synchronizes data between user application instances, and detects the failure of a node or a LAN. Upon detecting such a failure, the DR system will gracefully reform by excluding the failed node and/or LAN nodes within the newly reformed cluster. All surviving application instances will continue to run in parallel—with only one active primary instance.
Keywords/Search Tags:Application, Software system, Node, Computing, DRCSS, Instances
Related items