Font Size: a A A

System reliability through algorithm-based fault tolerance and reconfiguration

Posted on:1999-04-20Degree:Ph.DType:Thesis
University:Oregon State UniversityCandidate:Ramanathan, GowriFull Text:PDF
GTID:2468390014473500Subject:Computer Science
Abstract/Summary:
With computers being used in critical and life-impacting applications, system reliability becomes vital. Fault-Tolerance is a proven approach to improve reliability of computer systems. In this thesis we have studied the Reconfiguration and the Algorithm-Based Fault-tolerance (ABFT) techniques.; The ABFT techniques tolerate faults at system level. These techniques allow user to decide the degree of fault-tolerance needed. Achieving fault-tolerance under these techniques is also cost-effective. For these principal reasons, the ABFT techniques have been researched actively to apply to several numerical algorithms. Typically, in ABFT approach the input data for an algorithm are encoded to locate or detect errors. The number of redundant computations involved in the encoded data has to be bounded. In our research, we have improved the existing bounds on the redundant computations for many of the problem categories under the ABFT techniques.; The new bounds for error-detections are derived using Latin Square (LS) arrangements. This is the first time LS has been applied for these problems. These bounds show significant improvement over the existing bounds. We have derived the bounds for both P and {dollar}Psb{lcub}g{rcub}{dollar} models for this family of problems.; We have also studied the bounds for error-location problems of ABFT techniques. The results presented in this thesis are the first ever for this category of problems. We have applied Chinese Reminder Theorem in unique ways to derive these bounds.; Our research also includes formulation of a new family of problems, i.e., error-location and error-detection. We have obtained bounds for a special case under this new category for both P and {dollar}Psb{lcub}g{rcub}{dollar} models.; The Wafer Scale Integrated (WSI) technology is holding promises for future demand on computational powers. The WSI Augmented Processor Array proposed earlier this decade has a balance between overhead of spare processors and tolerance for faults. Our contribution to this topic is in designing an efficient static reconfiguration algorithm.
Keywords/Search Tags:ABFT techniques, System, Reliability, Bounds, Fault-tolerance
Related items