Font Size: a A A

Copy-Free Checkpointing System For Non-Volatile Memory

Posted on:2017-06-18Degree:MasterType:Thesis
Country:ChinaCandidate:X GaoFull Text:PDF
GTID:2348330503489867Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Fault tolerance is a crucial problem for exascale systems. Checkpointing is commonly used to address this problem, however, in HDD-based high performance computing systems, it incurs huge overhead as its frequency increases. Memory-based checkpointing mitigates such overhead by sharing data between working memory and checkpoint, but encounters a new problem that it requires cross node data duplication mechanism to avoid losing data upon node failure, which further incurs communication overhead. Such overhead can be resolved by non-volatile memory(NVM). However, NVM's relatively poor write performance enlarges the impact caused by extra writes, which are incurred by checkpoint consistency assurance mechanism.Copy-free checkpointing system is proposed to explore the potential of hybrid memory based checkpointing mechanism. Hybrid memory is composed by DRAM and NVM, which is expected to be next generation memory system. By utilizing Switch-on-Write and Twins Page Mapping, copy-free checkpointing model resolves the extra write problem. Twins Page Mapping maps one physical page with two PCM pages, and splits these pages into cache lines. When a cache line, which is both the working memory and checkpoint, is going to be written, Switch-on-Write transfers its working memory role to its counterpart, namely the cache line which is in its corresponding PCM page and has the same offset in page. Then the write request will be redirected to its counterpart to avoid corrupting the consistency of checkpoint. Besides that, the potential overhead in implementation also gains attention and has been handled appropriately.The evaluation covers various workloads and proves that copy-free checkpointing model adopts different access patterns. The evaluation also demonstrates at most 1.88 x speedup and 5.99 x writes reduction.
Keywords/Search Tags:Fault Tolerance, Checkpointing, Non-Volatile Memory, Phase-Change Memory, Hybrid-Memory System
PDF Full Text Request
Related items