Font Size: a A A

The Design And Implementation Of Process-level Fault-tolerant System Based On Checkpoint Optimization

Posted on:2015-12-12Degree:MasterType:Thesis
Country:ChinaCandidate:S X WangFull Text:PDF
GTID:2308330464464561Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
In a large computing environments, failure is inevitable, and the resulting loss is huge. Fault-Tolerant technology can reduce the impact of failure and improve system reliability in a certain extent. The fault-tolerant in process-level was directed to run the task itself fault-tolerance mechanisms that can ensure the continuity of the tasks run and fast recovery after failure occurring, which check pointing and rollback recovery technique is a common method.The design of fault-tolerant systems in process level is also based on the checkpoint. In order to complete the system functionality time, we need to address two major issues. One is the checkpoint time, and the other is making the checkpoint state consistency. Traditional static interval checkpoint interval model can finish setting up a checkpoint preliminary, but because it fails to timely according to changes in the probability distribution of dynamic checkpoint interval time to make adjustments, so that in the actual application it can produce high overhead. This paper presents a dynamic model of unequal interval checkpoint interval, which can dynamically adjust checkpoint interval, compared to the static method reduces the overhead. Traditional global blocking protocols can easily ensure the checkpoint state consistency, but when the process becoming more, the delay caused by blocked waiting for is lager. Taking into account the process of communication in terms of size, scope, and time between randomness and uncertainty, we propose grouping mechanism to process, so that communication within the group, without communication between groups, checkpoints, using non-blocking agreement between the groups and blocked agreement within the groups, thus forming a partial obstruction of the process, that is partially blocking consistency protocol.Through both optimization and the design compared with the traditional method of fault-tolerant systems can better meet the needs of dynamic fault tolerance for large and complex computing environment. To verify the correctness and validity of the proposed method, design and implement a process-level fault tolerant system based on checkpoint optimization. Experimental results show that this system can solve the domino effect, the premise of consistency, further reducing cost, reducing the actual execution time andimproving performance.
Keywords/Search Tags:process, checkpoint, dynamic non-equidistant, partially blocking, consistency
PDF Full Text Request
Related items