Font Size: a A A

Context-aware coherence protocols for future processors

Posted on:2008-02-24Degree:Ph.DType:Dissertation
University:The University of UtahCandidate:Cheng, LiqunFull Text:PDF
GTID:1445390005979473Subject:Computer Science
Abstract/Summary:
The semiconductor industry is experiencing a shift from "computation-bound design" to "communication-bound design." Many future systems will use one or many chip multiprocessors (CMPs) and support shared memory, as CMP-based systems can provide high-performance, cost-effective computing for workloads with abundant thread-level parallelism. One of the biggest challenges in CMP designs is to employ efficient cache coherence protocols to maintain coherence.; Most commercial products implement directory-based protocols to maintain coherence. Directory-based protocols avoid the bandwidth and electrical limits of a centralized interconnect by tracking the global coherence state of cache lines via a directory structure. Directory-based protocols are preferred in future systems due to their ability to exploit arbitrary point-to-point interconnects. However, existing directory-based protocols are not optimum in specific contexts. For example, existing protocols cannot optimize the coherence traffic according to the sharing patterns shown in the applications or the varying latency, bandwidth needs of different coherence messages, which leads to suboptimal performance and extra power consumption.; To overcome these limitations of conventional directory-based protocols, we propose two cache coherence protocols that optimize the coherence traffic based on context knowledge. After automatically detecting a stable producer-consumer pattern in an application, our sharing pattern-aware coherence protocol uses directory delegation to delegate the "home directory" of a cache line to the producer node, thereby converting 3-hop coherence operations into 2-hop operations. Then, the producer employs speculative updates to push the data to where it might soon be consumed. When the producer correctly predicts when and where to send updates, these 3-hop misses become local misses, effectively eliminating the impact of remote memory latency. Our interconnect-aware protocol can exploit a heterogenous interconnect comprised of wires with varying latency, bandwidth, and energy characteristics. By intelligently mapping critical messages to wires optimized for delay and noncritical messages to wires optimized for low power, our interconnect-aware protocol can achieve performance improvement and power reduction at the same time.; We demonstrate the performance advantage of the proposed mechanisms through architecture-level simulation. The producer-consumer sharing-aware protocol reduces the average remote miss rate by 40%, reduces network traffic by 15%, and improves performance by 21% on seven benchmark programs that exhibit producerconsumer sharing using a cycle-accurate simulator of a future 16-node SGI multiprocessor. The interconnect-aware protocol yields a performance improvement of 11% and energy reduction of 22% on a set of scientific and commercial benchmarks for a future 16-core CMP system.
Keywords/Search Tags:Future, Coherence, Protocols
Related items