Font Size: a A A

Scalability and fault tolerance in global computing

Posted on:2003-02-28Degree:Ph.DType:Dissertation
University:University of California, Santa BarbaraCandidate:Neary, Michael OliverFull Text:PDF
GTID:1468390011485191Subject:Computer Science
Abstract/Summary:
Javelin 3 is a Java-based software system for scalable, fault tolerant, adaptively parallel “Global” (a.k.a. Internet or Grid) Computing. Projects like SETI@home have recently garnered a lot of popular interest in this field; however, they are largely geared to specific applications. Javelin 3 is intended to free application developers from concerns about complex inter-processor communication, task scheduling, and fault tolerance among networked hosts. When all or part of their application can be cast as a master-worker or a branch-and-bound computation, Javelin 3 allows developers to focus on the underlying application. The dissertation highlights the scalability and fault tolerance of Javelin 3, and the distributed work stealing and advanced eager scheduling mechanisms used. The scheduling strategy enables dynamic task decomposition, which improves load balancing in the presence of tasks whose non-uniform computational load is evident only at execution time. We provide an analysis of the expected performance degradation due to unresponsive hosts, and measurements of actual performance degradation due to unresponsive hosts. We also present speedup measurements of a large-scale branch-and-bound application, using up to 1,024 hosts.
Keywords/Search Tags:Fault, Application, Hosts
Related items