Font Size: a A A

Achieving high availability with commodity hardware and software

Posted on:2009-04-20Degree:Ph.DType:Dissertation
University:The University of Wisconsin - MadisonCandidate:Aggarwal, NidhiFull Text:PDF
GTID:1448390002995083Subject:Engineering
Abstract/Summary:
Scaling integrated circuit technology into the deep submicron regime is expected to increase both soft and hard error rates significantly. Providing high availability in the presence of unreliable components will become an increasingly important requirement for a diverse set of systems. Traditionally, high availability systems have used specialized hardware and software that cater to a small subset of the application domain like banking and mission critical applications. However, custom designed hardware and/or software is not a viable solution for the cost-competitive commodity market because both commodity hardware and software are resistant to significant changes.;I propose a set of techniques that can be used to adapt commodity hardware and software for use as building blocks for future high availability systems. I propose a new chip multiprocessor architecture called configurable isolation chip multiprocessor (CI CMP) that provides low-level isolation for fault containment and reconfiguration, through cost-effective modifications to commodity designs. Specifically, the CI CMP architecture introduces a minimal amount of hardware support for dynamic repartitioning of CMP hardware into multiple fault zones. Results show that the CI CMP architecture is superior both for low-availability and high-availability implementations, while still using general-purpose commodity components.;To enable availability for commodity software, I propose using a virtual machine monitor (VMM) that can provide transparent redundancy to "off-the-shelf" software. Using a VMM enables fault tolerance with no changes to any commodity software, including the OS, runtime software and application software. The VMM manages the creation and synchronization of redundant threads and performs like ECC, chip kill, DRAM line sparing. A duplication cache can reduce the overheads of recovery in the event of a failure. Results show that for a diverse set of benchmarks the synchronization overhead lies between 3-14%.;Finally, I propose techniques that can reduce the cost of full duplication of memory by duplicating only those memory pages that are written and sharing pages that are never read. Computational errors are not propagated to read only pages and they can be protected by traditional memory protection techniques duplication by 90% with less than 10% performance degradation.
Keywords/Search Tags:Software, High availability, Commodity, Hardware, CI CMP
Related items