Font Size: a A A

Research On Fault Tolerance And Real-Time In Distributed And Interactive Simulation

Posted on:2007-11-09Degree:DoctorType:Dissertation
Country:ChinaCandidate:M MaFull Text:PDF
GTID:1118360215470565Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Simulation means that in order to get characteristics of some unknown objects, people study some other objects by constructing models according homology principle. After simulation system came into being, it develops from SIMNET, DIS, ALSP and HLA, and has been widely used in military affairs, economy and society presently. Along with the development of simulation system, simulating scale becomes larger and simulating time becomes longer, however reliability has not been developed, nor has the real time problem been solved in distributed simulation. But the important role of simulation system in national economy requires these problems to be solved as early as possible.In HLA and complex system simulation, there are a few studies of fault tolerance. Further more these studies mostly focus on the method used in distributed system without considering the characteristics of simulation system. So these schemes may affect efficiency of simulation system. Some researches of real-time extention in simulation have been done, but they are constructed on special simulation system, and can not be used in general simulation architectures.To satisfy fault tolerance in simulation system with minimal overhead, we research in below aspects:Replica method, we present several frameworks of fault tolerance in HLA, the frameworks consider HLA rules and fault tolerance both. In order to solve the critical problem of read/write operation in replica system, we propose using Byzantine Quorum system. Disliking reading one and writting all protocol, it has the advantage of more reliable to read data and more balanced between reading and writing.Checkpoint method, through analyzing of time warp algorithm, we propose using mathematical probability model to evaluate overhead of checkpoint algorithm. An adaptive checkpoint method has been introduced, which sets asynchronous checkpoint according to optimistic time warp algorithm and sets forced checkpoint according to fault tolerance requirement. The sufficient and necessary conditions of reaching optimal checkpoint intervals have been drawn out. Moreover some types of probability models are discussed in details.Passive-backup, we present a novel fault tolerance protocol which combines the causal message logging method and prime-backup scheme. The proposed protocol uses iterative backup location scheme and adaptive update interval to reduce overhead and balance the cost of fault tolerance and recovery time. The protocol has the characteristic of no orphan state, and do not demand the survival processes to roll back. Most important is that the recovery scheme can tolerant/ concurrently failures, even a permanent failure of single node. Correctness of the protocol is proved and experiments show the protocol is efficient. Besides researching in fault tolerance methods, we also study how to enhance real-time in distributed simulation system.Alfa-beta Simulation Branch-Cutting, we introduce a method of using simulation clone technology to examine alternative scenarios concurrently within single simulation. Alfa-beta branch-cutting method has been proposed to cut unimportant simulation branches. Furthermore a frame base on HLA is designed to implement branch-cutting and federation clone. Experiments show branch cutting method can accelerate simulation and can be widely used for military simulations.Real-time fault tolerant schedule, real-time and fault tolerance are two aspects to be considered in real-time systems. The scheme of Prime-Backup is popularly used to satisfy the two aspects requirements, but tasks of backup version occupy many resources of system. To maximize resources utilization, we introduce conception of overlap degree. Combining the technologies of backup-backup overload and prime-backup overload, a heuristic MSO (Minimal Slot Occupy) algorithm is proposed to schedule the real-time tasks. Experiment results show that the new schedule algorithm is effective. The time complexity of heuristic schedule algorithm is O(nm~2), and it can be widely used in real time systems.State estimation, In WAN and distributed environment, the network transport delay and network partition problems are very difficult to be handled. According to analysis of simulation models, we divide system models into continuous state model and discrete state model two types. Hermite interpolating and Markov process theory are proposed to handle network delay and partition problems by substituting real computing data with estimated data, so simulation system can passing through the transient failure or data transport delay. Consequently real time and fault tolerance characteristics of whole simulation system are improved. At last experimental results show that proposed method is effective under condition of continuous system model and small step (look-head).
Keywords/Search Tags:Distributed interactive simulation, High Level Architecture (HLA), fault tolerance, casual logging, prime-backup, real time
PDF Full Text Request
Related items