Font Size: a A A

Research On Parallel Re-computation For Hybrid Memory

Posted on:2020-02-02Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y LiFull Text:PDF
GTID:2428330590958330Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the rapid development of the Cloud Computing,Big Data and Artificial Intelligence,massive data is generated every second.In order to handle these data quickly and effectively,the number of computer processor is increasing.And the number of faults is also increasing when parallel applications make full use of these processors.The running time is longer than mean time between failures of many applications.Once a failure occurred,the function of this running application is invalid.So,in the parallel computing,applications are better to have the function of fault tolerance.Checkpointing is a common fault tolerance technique.However,the storage of checkpoints,the speed of recovery and the influence to system performance are all important problems.Diskless checkpointing eliminates the performance bottleneck of I/O in disk.But it brings serious storage consuming in memory.In order to handle these problems,we propose PRec(Parallel Re-computation)for hybrid memory.In the hybrid memory of DRAM(Dynamic Random Access Memory)and NVM(Non-Volatile Memory),PRec obtains data by recomputing instead of writing data to NVM.It reduces the storage consumption of checkpoints and enhances the lifetime of NVM.It speeds up the recovery process by parallel computing,and it is beneficial for the system performance.PRec divides the original codes into different code chunks to make sure the positions of re-computation labels,the positions of check labels and identifies the data must be saved in checkpoints.Then,it can achieve the functions of saving application level checkpoints and recovering the systems by parallel computing.We conducted our experiments using Hewlett Packard's Quartz,and comprehensively evaluated PRec's performance using the Nas Parallel Benchmark.We compare PRec with CRIU.PRec is good for reducing the storage overhead of checkpoints,reducing the number of NVM access,reducing the energy consuming and speeding up the recovery process.Experimental results demonstrate the efficacy and efficiency of PRec.
Keywords/Search Tags:Hybrid Memory, Fault Tolerance, Re-computation, Parallel Computing
PDF Full Text Request
Related items