Font Size: a A A

Reconfigurable fault tolerance for space systems

Posted on:2014-01-29Degree:Ph.DType:Dissertation
University:University of FloridaCandidate:Jacobs, Adam MFull Text:PDF
GTID:1458390005999897Subject:Engineering
Abstract/Summary:
Commercial SRAM-based, field-programmable gate arrays (FPGAs) have the capability to provide space applications with the necessary performance, energy-efficiency, and adaptability to meet next-generation mission requirements. However, mitigating an FPGA's susceptibility to radiation-induced faults is challenging. Triple-modular redundancy (TMR) techniques are traditionally used to mitigate radiation effects, but TMR incurs substantial overheads such as increased area and power requirements. In order to reduce overhead while providing sufficient radiation mitigation, this research proposes a framework for reconfigurable fault tolerance (RFT) that enables system designers to dynamically adjust a system's level of redundancy and fault mitigation based on the varying radiation incurred at different orbital positions. To realize this goal and validate the effectiveness of the approach, three areas are investigated and addressed. First, a method for accurately estimating time-varying fault rates in space systems and a reliability and performance model for adaptive systems are needed to quantify the effectiveness of the RFT approach. Using multiple case-study orbits, our models predict that adaptive fault-tolerance strategies are able to improve unavailability by 85% over low-overhead fault tolerance techniques and performability by 128% over traditional, static TMR fault tolerance. Second, low-overhead fault-tolerance techniques which can be used within the RFT framework for improved performance must be investigated. The effectiveness of Algorithm-Based Fault Tolerance (ABFT) for FPGA-based systems is explored for matrix multiplication and FFT. ABFT kernels were developed for an FPGA platform, and reliability was measured using fault-injection testing. We show that matrix multiplication and FFTs with ABFT can provide improved reliability (vulnerability reduced by 98%) with low resource overhead, and scale favorably with additional parallelism. Third, methods for facilitating the integration of RFT hardware into existing PR-based systems and architectures are explored. We expand the RFT framework to be used with bus-based or point-to-point architectures. We design a fault-tolerant task-scheduling algorithm which can schedule RFT tasks in a dynamically-changing fault environment in order to maximize system performability. Combined, these three areas demonstrate the capability of RFT to provide both performance and reliability in space. Using low-overhead fault-tolerance techniques and reconfiguration, RFT can meet the strict constraints of next-generation space systems.
Keywords/Search Tags:Space, Fault, RFT, Systems, Reliability, Techniques, Performance
Related items