Font Size: a A A

The Research And Application Of Diskless Checkpointing Scheme For Lower Overhead

Posted on:2016-12-25Degree:MasterType:Thesis
Country:ChinaCandidate:E Q YanFull Text:PDF
GTID:2428330473964833Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Diskless checkpointing is an effective solution to avoid the I/O bottleneck in disk-based checkpointing while tolerating a small number of node failures in large distributed systems.In contrast to disk-based checkpointing,diskless checkpointing lets checkpoints save in the memory of all nodes in a distributed mode rather than a stable disk to eliminate I/O bottleneck.However,saving checkpoints in the memory raises the issue that checkpoint unavailability when one or more faults occur.In order to guarantee the reliability,encoding techniques have been applied.The encoding techniques encode the checkpoints from all nodes to one or several redundant encoded checkpoints and store them in the memory of the dedicated nodes.When one or more faults occur,the checkpoints saved by the fault nodes will lose.But they can be recovered using the redundant encoded checkpoints on the dedicated nodes and the checkpoints on the surviving nodes.However,the existing encoding schemes used by diskless checkpointing lead to a high communication overhead because of cross nodes encoding,which goes against the adoption of diskless checkpointing,especially for systems with limited network bandwidth.In this paper,a vertical encoding scheme is proposed to address this problem through checkpoint partition and limiting encoding into a node.For system with our approach,each node only needs to send one time the checkpoint size data and several redundant encoded blocks to other nodes in performing diskless checkpoint.As to recovery,each node only needs to collect at most one time the checkpoint size data from other nodes.As to both,our approach does not introduce extra encoding overhead.The experimental results show that the vertical encoding scheme is an effective solution to reduce communication overhead,especially for application scenario that tolerating a small number of node failures in large distributed system.In addition,the vertical encoding scheme is scalable in the sense that the overhead to survive k node failures in n nodes does not increase as the number of nodes n increases when n is not very huge.Moreover,from view of a node,the vertical encoding scheme avoids the use of dedicated nodes with approximate memory consumption of existing encoding schemes when n is very larger than k.What's more,our approach can be applied to multi-level diskless checkpointing to get further effort.In addition,based on the commonness analysis of fault tolerance on diskless checkpointing and distributed storage system,a fault tolerance scheme is proposed for distributed storage system,which is based on the encoding scheme.Duplicate scheme and encoding scheme is used together in this scheme.The former is used to tolerate the loss of hot data,as the transfer storage for the encoding scheme at the same time,which can provide better performance.The latter is applied to the tolerance of cold data,which can improve the storage utilization.Through this way,our approach gives consideration to both the performance and storage utilization.
Keywords/Search Tags:Diskless checkpointing, Encoding scheme, Communication overhead, Vertical encoding scheme
PDF Full Text Request
Related items