Cost-effective designs for supporting correct execution and scalable performance in many-core processors

Posted on:2011-02-11

Degree:Ph.D

Type:Thesis

University:Duke University

Candidate:Romanescu, Bogdan Florin

Full Text:PDF

GTID:2448390002950785

Subject:Engineering

Abstract/Summary:

Many-core processors offer new levels of on-chip performance by capitalizing on the increasing rate of device integration. Harnessing the full performance potential of these processors requires that hardware designers not only exploit the advantages, but also consider the problems introduced by the new architectures. Such challenges arise from both the processor's increased structural complexity and the reliability issues of the silicon substrate. In this thesis, we address these challenges in a framework that targets correct execution and performance on three coordinates: (1) tolerating permanent faults, (2) facilitating static and dynamic verification through precise specifications, and (3) designing scalable coherence protocols.;First, we propose CCA, a new design paradigm for increasing the processor's lifetime performance in the presence of permanent faults in cores. CCA chips rely on a reconfiguration mechanism that allows cores to replace faulty components with fault-free structures borrowed from neighboring cores. In contrast with existing solutions for handling hard faults that simply shut down cores, CCA aims to maximize the utilization of defect-free resources and increase the availability of on-chip cores. We implement three-core and four-core CCA chips and demonstrate that they offer a cumulative lifetime performance improvement of up to 65% for industry-representative utilization periods. In addition, we show that CCA benefits systems that employ modular redundancy to guarantee correct execution by increasing their availability.;Second, we target the correctness of the address translation system. Current processors often exhibit design bugs in their translation systems, and we believe one cause for these faults is a lack of precise specifications describing the interactions between address translation and the rest of the memory system, especially memory consistency. We address this aspect by introducing a framework for specifying translation-aware consistency models. As part of this framework, we identify the critical role played by address translation in supporting correct memory consistency implementations. Consequently, we propose a set of invariants that characterizes address translation. Based on these invariants, we develop DVAT, a dynamic verification mechanism for address translation. We demonstrate that DVAT is efficient in detecting translation-related faults, including several that mimic design bugs reported in processor errata. By checking the correctness of the address translation system, DVAT supports dynamic verification of translation-aware memory consistency.;Finally, we address the scalability of translation coherence protocols. Current software-based solutions for maintaining translation coherence adversely impact performance and do not scale. We propose UNITD, a hardware coherence protocol that supports scalable performance and architectural decoupling. UNITD integrates translation coherence within the regular cache coherence protocol, such that TLBs participate in the cache coherence protocol similar to instruction or data caches. We evaluate snooping and directory UNITD coherence protocols on processors with up to 16 cores and demonstrate that UNITD reduces the performance penalty of translation coherence to almost zero.

Keywords/Search Tags:

Performance, Processors, Translation, Correct execution, Coherence, UNITD, Cores, CCA

Related items

1	Efficient throughput cores for asymmetric manycore processors
2	Rcsarch And Design Of Cache Coherence For Mu11i-core Processors
3	Configurable Energy-efficient Co-processors to Scale the Utilization Wall
4	Performance bound energy efficient cache organization for multi-core processors: A comparison of private and shared cache
5	Reasearch On Functional Verification Method Of IP Cores And Embedded Processors For SoC
6	Correct Communication in Multi-core Processors
7	Designing heterogeneous many-core processors to provide high performance under limited chip power budget
8	Statistical machine learning based modeling framework for design space exploration and run-time cross-stack energy optimization for many-core processors
9	Integration and evaluation of cache coherence protocols for multiprocessor SoCs
10	Dynamic Stabilization Of MIPS Rates For Out-of-Order Processors