Font Size: a A A

Rationally motivated failure in distributed systems

Posted on:2009-08-22Degree:Ph.DType:Thesis
University:Harvard UniversityCandidate:Shneidman, Jeffrey AllenFull Text:PDF
GTID:2448390005956024Subject:Economics
Abstract/Summary:
Modern distributed systems are under threat from a surprising source: the willful failure by their own users. These failures are not due to chance, nor are they maliciously motivated. Rather, these failures are rationally motivated.;Despite evidence of rationally motivated failure in real systems, there has been surprisingly little work on rational behavior in the context of system fault tolerance. This thesis describes a form of defensive design used to build distributed systems that are robust to rationally motivated failure. Our approach proactively prevents rationally motivated failure by addressing the underlying failure cause. This approach differs markedly from traditional distributed system techniques of dealing with failure, which reactively seek to recover from an expressed failure.;This thesis makes four main contributions toward understanding and designing for rationally motivated failure. This thesis: (1) formalizes faithfulness as the metric by which to judge an algorithm's tolerance to rationally motivated failure. A proof of specification faithfulness is a certification that rational nodes will choose to follow the algorithm specified by the system designer. (2) reveals information revelation failure as an important type of failure expression, and shows that the feasibility and cost of addressing information revelation failure is different from other types of failure. (3) proposes a practical three-part methodology for building systems that can prevent rationally motivated failure expression. (4) applies this methodology to two problems: First, we use the rationally motivated failure remedy methodology to extend an existing interdomain routing protocol and prove the extension to be faithful under the ex post Nash solution concept. We implement the extended algorithm in a simulator and compare the message complexity with the original unfaithful algorithm. Second, we apply the methodology to the problem of distributed consensus to solve a scenario called the Rational Byzantine Generals problem. We design an algorithm that is faithful under the 1-partial ex post Nash solution concept. We implement the algorithm over a network and evaluate the fault tolerance, message complexity, and scalability by comparing the new algorithm with appropriate existing protocols.
Keywords/Search Tags:Failure, Distributed, Systems, Algorithm
Related items