Font Size: a A A

Reducing the performance impact of ensuring soft error reliability in high performance microprocessors

Posted on:2011-10-27Degree:Ph.DType:Dissertation
University:State University of New York at BinghamtonCandidate:Kumar, SumeetFull Text:PDF
GTID:1448390002965108Subject:Engineering
Abstract/Summary:
In this dissertation we identify the key reasons for the performance impact and energy impact of redundant multi-threading (RMT). We also propose various techniques to improve the performance and address the energy concerns of various flavors of RMT.;In our first approach, we propose reducing resource redundancy as a means to mitigate the performance impact of redundancy. We investigate two techniques in this approach (i) register bits reuse technique that attempts to use the same register (but different bits) for both the copies of the same instruction, if the result produced by the instruction is of small size, and (ii) register value reuse technique that attempts to use the same register for a main instruction and a distinct redundant instruction, if both the instructions produce the same result. These techniques, along with some others, are used to reduce redundancy in register file, reorder buffer, and load/store buffer. The techniques are evaluated in terms of their performance, power, and vulnerability impact on an RMT processor. Our experiments show that the techniques achieve about 95% performance improvement and about 17% energy reduction. The vulnerability of the RMT remains the same with the techniques.;Our second approach proposes reducing instruction redundancy (the instructions that are redundantly executed) as a means to mitigate the performance and energy impact of redundancy. We experiment with an decoupled RMT approach where the frontend pipeline stages are protected through error codes, while the backend pipeline stages are protected through redundant execution. In this approach, we define two categories of instructions---self-checking and semi self-checking instructions. Self checking instructions are those instructions whose results are checked for any errors when their "main" copies are executed. These instructions are not redundantly executed. Semi self-checking instructions are those instructions for which a major part of their results is checked when the "main" copies are executed, and the remaining part of the instructions is checked using a small amount of additional hardware. Reducing instruction redundancy with this approach has the same fault coverage as the base architecture where all the instructions are redundantly executed. Our experiments show that the techniques reduce instruction redundancy by about 58% and recover about 51% of the performance lost due to redundant execution. Our techniques also recover about 40% of the energy consumption increase in the key data-path structures.;In our third approach, we explore speculative mechanisms to trade-off reliability for performance in RMT. Our basic approach validates the execution of an instruction by comparing its result against the expected result. Only those instructions are redundantly executed for which the validations fail. This mechanism is expected to have a minimal vulnerability impact because it is highly unlikely that an erroneous result matches the expected value. We also propose several extensions to the basic approach that further explore the performance-reliability trade-off design space. A combination of these techniques incur about 10% performance impact and about 0.09% undetected instruction error rate, compared to about 25% performance impact for RMT with no undetected errors.;Finally we propose selective redundancy to achieve optimal performance-reliability tradeoff. We show that RMT based tradeoff techniques are a sub-optimal design choice, as they duplicate an instruction in the entire pipeline. We show that each hardware structure in the pipeline has a different set of instructions contributing, the most, towards its AVF. We identify Instruction Criticality Factor (ICF) as the percentage AVF contribution of an instruction towards the total AVF of a particular hardware structure. We propose schemes which provide selective redundancy based on ICF. We observe that duplicating only few instructions, with high ICF values, reduces the AVF of the structure by a significant amount. Our results show that, for integer RF, providing redundancy to 23% of instructions, reduces its AVF by 80%, whereas for floating point RF duplicating 24% of entries reduces its AVF by 33%. For integer cluster of IQ 11% duplication reduces its AVF by 44%. Corresponding number for float cluster is, 43% reduction by duplicating 7% instructions. When ICF based selective redundancy is applied to both IQ and RF, performance reduces by 8% as compared to base single thread, while the error coverage remains the same. (Abstract shortened by UMI.)...
Keywords/Search Tags:Performance, RMT, Error, Reduces its AVF, Instructions, Reducing, Redundancy, Techniques
Related items