Font Size: a A A

Research On Fault Tolerance For Transactional Memory System

Posted on:2012-11-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:W SongFull Text:PDF
GTID:1118330362460500Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of multi-core processors, transactional memory has attracted more and more attention as a promising concurrent control mechanism. On the other hand, with the development of large scale integrated circuit entering into deep submicron or even nanometer level, the processors become more and more susceptible to electromagnetic radiation, cosmic ray and other interfering resources. This makes the reliability of the processors become more outstanding, so as a result, the fault tolerance in transactional memory system becomes a concerning issue.In this paper, we study the issues on the fault tolerance in transactional memory system. Based on the theoretical foundation of error propagation behavior in transactional memory system, we propose the theoretical methods, technical solutions and implementation frameworks around the issues of fault detection, fault recovery and fault masking. This paper has the following contributions:1. Taking the error propagation behavior between statements sequence as the beginning, we analyze the error propagation behavior in transactional memory system progressively. We provide two sorts of fault tolerant positions and the corresponding fault tolerant objects, and prove the different fault tolerant abilities they have, and reveal the fault tolerant characteristics of transactional memory.2. We propose an error detection method based on redundant transaction– EDRT. This method creates a redundant copy for every transaction, and executes both the transaction and its copy, and achieves the error detection by comparing the write sets of the two transactions before the committing operation. In addition, we propose the system restraints and the designing guide for how to apply the EDRT to the transactional memory systems based on both the eager and lazy data-versioning mechanisms from the aspects of both the acquisition and comparison method of error detection data sets and the conflict detection mechanism. We prove that the EDRT has good error detection ability with low cost through a set of experiments.3. We propose a fault recovery method based on the transaction rollback– FRTR. This method takes the data-versioning mechanism as the checkpoint, and accomplishes the fault recovery by rolling back the single fault transaction. We prove the sufficiency for fault recovery in transactional memory system through discussing the isolation of the transactional memory system that supports the FRTR. We also prove the low cost of FRTR through a set of experiments. In addition, we introduce the idea of parallel recomputing into the FRTR to reduce the cost of FRTR, and we provide the programming guide of the parallel recomputing for OpenTM. We also prove the availability of this optimization method through a set of experiments.4. We propose a fault tolerant method based on triple redundancy– TriTM. This method introduces the idea of triple redundancy into the transactional memory system, taking the write sets of the transactions as the data comparison set, and implements a low cost fault masking method. By utilizing the error correction ability of TriTM, we propose an optimization method based on the optimization of the set of the comparison point in TriTM. In addition, we implement the TriTM in the closed nesting transactional memory system. And we also prove the low cost of TriTM and the availability of Opti_TriTM through a set of experiments.
Keywords/Search Tags:Fault Tolerance, Transactional Memory, Fault Detection, Fault Recovery, Fault Masking
PDF Full Text Request
Related items