Font Size: a A A

Fault and defect tolerant computer architectures: Reliable computing with unreliable devices

Posted on:2007-07-15Degree:Ph.DType:Dissertation
University:Air Force Institute of TechnologyCandidate:Roelke, George R., IVFull Text:PDF
GTID:1458390005481554Subject:Engineering
Abstract/Summary:
As conventional silicon Complementary Metal-Oxide-Semiconductor (CMOS) technology continues to shrink, logic circuits are increasingly subject to errors induced by electrical noise and cosmic radiation. In addition, the smaller devices are more likely to degrade and fail in operation. In the long term, new device technologies such as quantum cellular automata and molecular crossbars may replace silicon CMOS, but they have significant reliability problems. Rather than requiring the circuit to be defect-free, fault tolerance techniques incorporated into an architecture allow continued system operation in the presence of faulty components.; This research addresses construction of a reliable computer from unreliable device technologies. A system architecture is developed for a "fault and defect tolerant" (FDT) computer. Trade-offs between different techniques are studied, and the yield of the system is modelled. Yield and hardware cost models are developed for the fault tolerance techniques used in the architecture.; Fault and defect tolerant designs are created for the processor, and its most critical component, the cache memory. A content-addressable memory (CAM)-based cache design is developed. Simulation results show the cache achieves 90% yield with device failure probabilities of 3 x 10-6, three orders of magnitude better than non fault tolerant caches of the same size. The entire processor achieves 70% yield with device failure probabilities exceeding 10-6. The hardware redundancy required to achieve this performance is approximately 15 times that of a non-fault tolerant design. While large compared to fault tolerant designs used today, this architecture allows the use of devices much more likely to fail than silicon CMOS. Given the size improvements predicted for future device technologies, the hardware overhead may be acceptable.; As part of the work to develop reliable models for fault tolerance techniques, an improved model is developed for NAND Multiplexing, a cornerstone fault-tolerance technique based upon large levels of redundancy. The model is the first exact model for NAND Multiplexing with small and medium amounts of redundancy. Previous models are extended to account for dependence between the inputs and produce more accurate results. An example shows the required hardware redundancy is reduced by 50%.
Keywords/Search Tags:Fault, Device, Architecture, Reliable, Computer, Hardware, Redundancy
Related items