Rationally motivated failure in distributed systems

Posted on:2009-08-22

Degree:Ph.D

Type:Thesis

University:Harvard University

Candidate:Shneidman, Jeffrey Allen

Full Text:PDF

GTID:2448390005956024

Subject:Economics

Abstract/Summary:

Modern distributed systems are under threat from a surprising source: the willful failure by their own users. These failures are not due to chance, nor are they maliciously motivated. Rather, these failures are rationally motivated.;Despite evidence of rationally motivated failure in real systems, there has been surprisingly little work on rational behavior in the context of system fault tolerance. This thesis describes a form of defensive design used to build distributed systems that are robust to rationally motivated failure. Our approach proactively prevents rationally motivated failure by addressing the underlying failure cause. This approach differs markedly from traditional distributed system techniques of dealing with failure, which reactively seek to recover from an expressed failure.;This thesis makes four main contributions toward understanding and designing for rationally motivated failure. This thesis: (1) formalizes faithfulness as the metric by which to judge an algorithm's tolerance to rationally motivated failure. A proof of specification faithfulness is a certification that rational nodes will choose to follow the algorithm specified by the system designer. (2) reveals information revelation failure as an important type of failure expression, and shows that the feasibility and cost of addressing information revelation failure is different from other types of failure. (3) proposes a practical three-part methodology for building systems that can prevent rationally motivated failure expression. (4) applies this methodology to two problems: First, we use the rationally motivated failure remedy methodology to extend an existing interdomain routing protocol and prove the extension to be faithful under the ex post Nash solution concept. We implement the extended algorithm in a simulator and compare the message complexity with the original unfaithful algorithm. Second, we apply the methodology to the problem of distributed consensus to solve a scenario called the Rational Byzantine Generals problem. We design an algorithm that is faithful under the 1-partial ex post Nash solution concept. We implement the algorithm over a network and evaluate the fault tolerance, message complexity, and scalability by comparing the new algorithm with appropriate existing protocols.

Keywords/Search Tags:

Failure, Distributed, Systems, Algorithm

Related items

1	Research On Key Techniques Of Failure Detection In Distributed Systems
2	Research On Qos-oriented Failure Detection Service In Distributed Systems
3	Designing And Implementation Of Failure Detector In Asynchronous Distributed Systems
4	Exploring The Predictability Of Fault Events In Stochastic Discrete Event Systems
5	Social Networks, The Recovery Of The Performance Impact Of Large-scale Distributed Systems Research
6	Research On Failure Detection Of The Disaster Tolerance Storage Systems
7	Research On Technologies Of Adaptive Failure Detection In Distributed Systems
8	On The Disk Failure Recovery Of RDP-based Distributed Storage Systems
9	Study On Parallel Recovery For Node Failure In Heterogeneous Storage Systems
10	Research Of A QoS-based Adaptive Failure Detection Algorithm