Increasing cache efficiency by eliminating noise and using restrictive compression techniques

Posted on:2011-05-21

Degree:Ph.D

Type:Dissertation

University:State University of New York at Binghamton

Candidate:Pujara, Prateek

Full Text:PDF

GTID:1448390002456920

Subject:Engineering

Abstract/Summary:

With the increasing performance gap between the processor and the memory, the importance of caches is increasing for high performance processors. However, with reducing feature sizes and increasing clock speeds, cache access latencies are increasing, limiting the size of level 1 cache that can be integrated on the chip. Limited sized caches can significantly impact cache miss rate, thus reducing performance.;We investigate restrictive compression techniques for level 1 data cache, to avoid an increase in the cache access latency. The basic technique --- All words Narrow (AWN --- compresses a cache block only if all the words in the cache block are of narrow size. We extend the AWN technique to store a few upper half-words (Additional Half-word Space - AHS) in a cache block to accommodate a small number of normal sized words in the cache block. Further we make the AHS technique adaptive, where the additional half-words space is adaptively allocated to the various cache blocks. We also propose techniques to reduce the increase in the tag space that is inevitable with compression techniques. The above techniques increase the L1 data cache capacity (in terms of the average number of valid cache blocks per cycle) by about 50%, compared to the conventional cache, with no or minimal impact on the cache access time. In addition, the techniques have the potential of reducing the average L1 data cache miss rate by about 23%.;We know that caches are very inefficiently utilized because not all the excess data brought into the cache, to exploit spatial locality, is utilized. We define cache utilization as the percentage of data brought into the cache that is actually used. Our experiments showed that Level 1 data cache has a utilization of only about 57%. Increasing the effectiveness of the cache (by increasing its utilization) can have significant benefits in terms of reducing the cache energy consumption, reducing the bandwidth requirement, and making more space available for the useful data.;We focus on prediction mechanisms to predict the unused data in a cache block (cache noise). The prediction mechanisms consider the words usage history of cache blocks for predicting the useful data, so that only the useful data is fetched into the cache on a cache miss. In particular, we investigate three flavors of prediction mechanisms: (i) phase context prediction; which considers the words usage history of the current phase of the program, (ii) memory context prediction; which considers the words usage history of contiguous memory locations, and (iii) code context prediction; which considers the words usage history of a contiguous set of instructions. We found that the code context predictor has the best predictability of about 95% with a simple last word usage predictor.;When applying cache noise prediction to L1 data cache, we observed about 37% improvement in cache utilization, and about 23% and 28% reduction in cache energy consumption and bandwidth requirement, respectively. Cache noise mispredictions increased the miss rate by 0.1% and had almost no impact on the Instructions Per Cycle (IPC) count. When compared to a sub-blocked cache, fetching the to-be-referenced data (given by cache noise predictor) resulted in 97% and 44% improvement in miss rate and cache utilization, respectively. However, the sub-blocked cache had a bandwidth requirement about 35% of the cache noise prediction based approach. We also observed that cache noise prediction significantly improves the utilization and reduces the bandwidth requirement for prefetching.;We use this highly accurate prediction mechanism to fetch only the to-be-referenced data into the L1 data cache on a cache miss. We then utilize the cache space, thus made available, to store words from multiple cache blocks in a single physical cache block space in the cache, thus increasing the useful words in the cache. We also propose methods to combine this technique with a value-based approach to further increase the cache capacity. Our experiments show that, with our techniques we achieve about 57% of the L1 data cache miss rate reduction and about 60% of the cache capacity increase observed when using double sized cache, with only about 25% cache space overhead.;Finally we show the effect of our techniques on Simultaneous Multi-Threaded Processors (SMT), where cache blocks from multiple threads vie for the precious cache space. Our techniques achieve about 46% reduction in miss rate as achieved by double sized cache. We also show the effect of our techniques on context switching, where the cache for a particular context is polluted by the other context. We observed that our techniques increase the utilization of L1 data cache from 30% in context switching base case to 80% and it increases the cache capacity by about 55% as compared to the base case context switching.

Keywords/Search Tags:

Increasing, L1 data cache, Techniques, Considers the words usage history, Cache capacity, Data brought into the cache, Cache noise, Cache block

Related items

1	Analysis And Research Of Accelerating System Based On Data Stream Cache Mechanism
2	Application Research Of Data Cache Technology In MIS
3	Design And Implementation Of Distributed Cache Management System For In-memory Columnar Database
4	Study On Cache Partition Optimization Based On Non-stacked Cache Replacement Algorithm
5	Architectural Level Leakage Power Optimization For Cache Memory In Microprocessors
6	Classification-based Prefetch-Aware Cache Partition Mechanism
7	Research And Implementation Of Cache Technology Based On WWW
8	Adaptive Cache Management Policies For High Performance Microprocessors
9	The Design And Implementation Of Data Cache Subsystem For The Next Generation Network Security Situation Assessment System
10	Cache Of .32-bit Embedded Processor Design