Font Size: a A A

Improving utilization and availability of high-performance computing in space

Posted on:2007-10-09Degree:Ph.DType:Dissertation
University:University of FloridaCandidate:Subramaniyan, RajagopalFull Text:PDF
GTID:1448390005965468Subject:Engineering
Abstract/Summary:
Space missions involving science and defense ventures have ever-increasing demands for data returns from their resources in space. The traditional approach of data gathering, data compression and data transmission is no longer viable due to the vast amounts of data. Over the past few decades, there have been several research efforts to make high-performance computing (HPC) systems available in space. The idea has been to have enough "on-board" processing power to support the many space and earth exploration and experimentation satellites orbiting earth and/or exploring the solar system. Such efforts have led to small-scale supercomputers embedded in the spacecraft and, more recently, to the idea of using commercial-off-the-shelf (COTS) components to provide HPC in space. Susceptibility of COTS components to Single-Event Upsets (SEUs) is a concern especially since space systems need to be self-healing and robust to survive the hostile environment. Fault-tolerant system functions need to be developed to manage the resources available and improve the availability of the HPC system in space. However, resources available to provide fault tolerance are fewer than traditional HPC systems on earth.; Several techniques exist in traditional HPC to provide fault tolerance and improve overall computation rate, but adapting these techniques for HPC in space is a challenge due to the resource constraints. In this dissertation, this challenge is addressed by providing solutions to improve and complement HPC in space. Three techniques are introduced and investigated in three different phases of this dissertation to improve the effective utilization and availability of HPC in space. In the first phase, new model to perform checkpointing at an optimal rate is developed to improve useful computation time. The results suggest the requirement of I/O capabilities much superior to present systems. While the performance of several common HPC scheduling heuristics that can be used for effective task scheduling to improve overall execution time is simulatively analyzed in the second phase, availability is improved by designing a new lightweight fault-tolerant message passing middleware in the third phase. Analyses of applications developed with the fault-tolerant middleware show that robustness of the systems in space can be significantly improved without degrading the performance. In summary, this dissertation provides novel methodologies to improve utilization and availability in space-based high-performance computing, thereby providing better and effective fault tolerance.
Keywords/Search Tags:Space, High-performance computing, Utilization and availability, HPC, Fault tolerance, Improve, Data
Related items