Font Size: a A A

The timekeeping methodology: Exploiting generational behavior to improve processor power and performance

Posted on:2003-05-12Degree:Ph.DType:Thesis
University:Princeton UniversityCandidate:Hu, ZhigangFull Text:PDF
GTID:2468390011479021Subject:Engineering
Abstract/Summary:
Today's CPU designers face increasingly aggressive CPU performance goals while also dealing with challenging limits on CPU power dissipation. The conflict of performance and power requirements increases the importance of simple but effective solutions for the widening gap between processor and memory performance. My research has demonstrated how aspects of processor and memory behavior can be optimized by exploiting knowledge about the time durations between key processor and memory events. These “timekeeping” techniques can give performance or power improvements with simple hardware structures.; In this thesis, the cache memory hierarchy is used as the main example to illustrate the effectiveness of the timekeeping methodology. I start by introducing basic concepts for this methodology, including the generational nature of cache reference streams. Using statistical distributions of key timekeeping metrics like reload and access intervals, I show how the metrics form the basis for a rich set of policies that can classify and predict program behavior. From these metrics and predictions, hardware mechanisms can be built to optimize the power or performance of the on-chip memory hierarchy.; Three mechanisms are presented to illustrate the application of the timekeeping methodology to the memory system. The first mechanism, cache decay, can reduce cache leakage energy by 4X by identifying long-idle cache lines with simple 2-bit counters and turning them off. The second mechanism, a timekeeping victim cache filter, uses the same counters in cache decay to capture cache lines with short dead times and choose them as candidates of using the victim buffer. This mechanism can filter out 87% of victim buffer traffic while improving performance. Both cache decay and the victim buffer filter exploit cache line lifetime behavior within single generations. In the third mechanism, timekeeping prefetch, we demonstrate how to exploit the regularity across consecutive generations of the same cache line. Timekeeping prefetch uses live time and next address of the previous generation as predictions for the current generation. The resulting prefetcher is highly effective and at the same time hardware-efficient. With an 8KB history table, an average performance improvement of 11% can be achieved across the whole SPEC2000 benchmark suite. This outperforms a recent proposal with a 2MB history table.; Outside the memory system, this thesis also shows how the timekeeping methodology can be applied to other subsystems such as branch predictors. A key characteristic of branch predictor data is that they are transient and predictive, in the sense that they are execution hints that do not affect program correctness, and they are often short-lived. To exploit this characteristic, we propose to use naturally decaying 4-transistor memory cells to build branch predictors, instead of traditional 6-transistor cells. This implementation can reduce branch predictor leakage by about 60-80% while providing a cell area advantage up to 33%.; The techniques presented in thus thesis clearly demonstrate the power of the timekeeping methodology. We expect that in our future work, as well as in work by other researchers, more timekeeping techniques can be proposed to help future processors to meet the many challenges in power and performance.
Keywords/Search Tags:Performance, Power, Timekeeping, Processor, CPU, Behavior, Cache, Exploit
Related items