Fault-Tolerant Techniques Research And Design For On-board Multi-computer Parallel System | Posted on:2011-07-16 | Degree:Master | Type:Thesis | Country:China | Candidate:W C Wang | Full Text:PDF | GTID:2178360305482725 | Subject:Computer Science and Technology | Abstract/Summary: | PDF Full Text Request | On-board Computers are easy to meet both hardware and software failures due to the impact brought by powerful outer space radiation. Thus, fault-tolerance techniques are needed to guarantee the reliability of computer systems running in satellites.This thesis presents a new design of on-board multi-machine parallel system and realizes its prototype. This on-board multi-machine parallel system uses distributed multi-node parallel architecture and has two features of good reconstruction ability and some generality. Based on the proposed architecture, we design several fault-tolerant schemes based on hierarchical fault-tolerant and failure-detection-recovery mechanism. These schemes are capable of handling failures caused by some bad space environment and improve the system reliability efficiently.The main contributions of this thesis are as follows:①design a dynamic client-server based multi-node parallel on-board parallel computer architecture. In this system, no centralized controlling component is included and all the management functions can be implemented by policies made by distributed nodes together;②Propose the concept of Alternative Primary Node (APN), which takes responsibility of monitoring the state of the Primary Node (PN). Moreover, the APN can take place of PN when PN fails. The APN improves the system reliability and the real-time fault-tolerant capability; Present the concept of global state table and memory access method of this table in multi-machine parallel architecture; Design the concept and the different types of failure-monitoring communication between different nodes;③Analyze and summarize the failures in on-board multi-machine parallel system by using FMEA. Based on the analysis, we design a hierarchical fault-tolerant and failure-detection-recovery mechanism. Based on this mechanism, we study several fault-tolerant schemes;④Design and implement the prototype of on-board multi-machine parallel system in VxWork development platform. We simulate the underlying hardware architecture and the basic running mechanisms. Furthermore, we develop several fault-tolerant schemes in the simulation environment;⑤Model and analyze the on-board multi-machine parallel system using Stochastic Petri Network (SPN). | Keywords/Search Tags: | Fault-tolerant techniques, On-board computers, FMEA, Hierarchical failure detection, VxWorks | PDF Full Text Request | Related items |
| |
|