Font Size: a A A

Research And Implementation Of Dual Fault-tolerant Embedded System Based On Intensive Computing Application

Posted on:2015-02-15Degree:MasterType:Thesis
Country:ChinaCandidate:F Y WangFull Text:PDF
GTID:2268330428475969Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the ARM processor, the ARM processor has been widely used in intensive computing fields. Given the demands of high reliability in these fields, how to design a high reliability system has become a critical issue.The checkpoint mechanism and the dual-computer fault-tolerant mechanism are the effective ways to improve the reliability, the checkpoint mechanism can shorten the recovery time of a task, the dual-computer fault-tolerant mechanism can tolerate permanent fault. However, the traditional fault-tolerant system does not take into account the checkpoint mechanism, once the embedded system occurred a fault, the task can only be restarted form the beginning position, which wasting lots time. So the traditional fault-tolerant embedded system can only be used in industrial control field. This paper, based on the traditional dual fault-tolerance embedded system and the checkpoint mechanism, put forward a new kind of dual fault-tolerance embedded system based on the checkpoint mechanism, which can be used into intensive computing fields.The length of checkpoint interval directly affects the reliability and the system overhead. This paper put forward two checkpoint optimization models. One is the checkpoint interval model based on the task deadline, it can analysis the possibility of task completion before the deadline. Another is the checkpoint interval model based on multiage checkpoint mechanism, which is based on the previous model. It can reduce the time of fault detection by using the two level checkpoints. When the time of the second-level checkpoint is shorter than the first-level checkpoint, the checkpoint interval model based on multiage checkpoint mechanism can get better performance than the pervious one. This paper designed the dual fault-tolerance embedded system based on the optimal checkpoint, it can repair transient faults and tolerance permanent faults. In the system there are three modules:the communication subsystem, the fault detection subsystem and the fault recovery subsystem. A factor limiting the checkpoint performance is the implementation of the I/O operations accessing disk storage. This paper put forward the modified write procedure category by transferring checkpoint data into a temporary storage buffer. This category can reduce the frequency of I/O operations and can get36%performance improvement than the original category. Finally this paper used two intensive computing algorithms (matrix multiplication algorithm and SUSAN algorithm) to test this system. The result shows that the optimal checkpoint algorithm can improve the reliability. This paper gives a new method to build a high reliability embedded system which is used in intensive computing applications.
Keywords/Search Tags:Checkpoint, Fault-tolerance, Embedded system, Reliability, Markov model
PDF Full Text Request
Related items