Font Size: a A A

Optimization Strategies For Storage In Distributed Checkpoint System

Posted on:2017-08-19Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y ChenFull Text:PDF
GTID:2428330488471856Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the level of computer hardware continuously improving,the scale and complexity of super-computing and cloud computing architecture will increase.However,the increasing scale brings more frequent occurances of faults.For example,in a distributed system,failure of one causes all to fail.It will have a tremendous waste of resources,therefore,fault-tolerant technology has become a huge waste of resources to avoid an indispensable requirement.For fault-tolerant,Checkpoint-Restart is a useful method,it has been widely used in computers and database systems.Program state information in memory can be written to persistent storage by using checkpoint technology.If a crash occurs,the previously saved checkpoint can be used to recover program to the latest state to avoid waste of time and resources for computing.However,with the increasing scale of computing,it will result in frequent settings for checkpoint and the checkpoint file size also increases.So,it is a serious challenge to storage scalability.In this paper,we analyzed the contents of the checkpoint files and the redundancy of these elements.The contents stored in the checkpoint files include:process descriptor,the contents of the process address space,register data.In the checkpoint file,the content of the process's address space occupied the main storage space.We analyzed the contents of the process's address space,try to find different segments of redundancy.The analysis shows that there are a lot of duplicate data in stack section.For distributed applications,there are a lot of duplicate data in heap segment between different processes,and the contents in their dynamic link library and code segments are the same.According to these characteristics,we propose a method to reduce the checkpoint file size.When the checkpoint is set,we take different strategies according to the redundancy in the contents of checkpoint files,and we can reduce the size of checkpoint files in the end.DMTCP is an open source checkpoint-restart package.We design and implement a checkpoint system based on DMTCP,and we evaluate its perfermance in the experiments.The results show that single node computer programs can reduce the size of the checkpoint files about 20%and distributed programs can reduce the size of checkpoint files about 47%.
Keywords/Search Tags:Fault-tolerant, Checkpoint Technology, Checkpoint File Size, Reundancy
PDF Full Text Request
Related items