Font Size: a A A

Cache Design And Runtime Performance Optimization Based On Utilization Characteristics

Posted on:2011-12-20Degree:MasterType:Thesis
Country:ChinaCandidate:L X XiangFull Text:PDF
GTID:2178360302474610Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Memory system has been one of the major bottlenecks of computer system performance. A visit to main memory usually costs several hundred of CPU cycles.To narrow the gap between the processor and memory system,cache memories are prevalently deployed. The importance of cache has become even more critical due to the increasing memory latency and increasing memory,requirement of emerging applications.The foundation of cache is the data locality,in applications' access streams.However, current cache designs pay little attention to characteristics of local reference for different cache levels nor to various access behaviors of applications and application phases.Therefore, cache perfomance within such designs is limited by the difficulty of adapting cache to access behaviors.This thesis analyzes the utilization characteristics for different cache levels,and proposes corresponding cache designs and runtime optimizations.For the first level cache,this thesis investigates its miss locality.Using the short miss phase as the metric of program phases,we observed that cache misses in L1 cache are mainly due to few leaky sets,which have both good continuity and good predictability.This thesis proposes a structure called Leacky Set Cache(LSC) to eliminate conflict misses for caches with low associativity.Through predicating the location of leaky sets,LSC adaptively buffers victims evicted from these leaky sets,and thus reduces conflict misses without lengthening the visit latency.In L2 cache,traditional LRU policy behaves poorly for workloads that have a working set larger than L2 cache,resulting in a great number of less reused lines that are never reused or reused for few times.In this case,the cache performance can be improved through retaining a portion of working set in cache long enough.Previous schemes approach this by bypassing never reused lines.Nevertheless,severely constrained by the number of never reused lines,sometimes they deliver no benefit because of the lack of never reused lines. This thesis proposes a new filtering mechanism that filters out the less reused lines rather than just never reused lines.The extended scope of bypassing provides more opportunities to fit the working set into cache,overcoming the problem encountered by previous schemes.This thesis also proposes the Less Reused Filter(LRF).a separate structure that precedes L2 cache,to implement the above mechanism.LRF employs a reuse frequency predictor to accurately identify the less reused lines from incoming lines.Meanwhile,based on our observation that most less reused lines have a short life span.LRF places the filtered lines into a small filter buffer to fully utilize them,avoiding extra misses. Our evaluation,for 24 SPEC 2000 benchmarks,shows that augmenting a 512KB LRU-managed L2 cache with a LRF having 32KB filter buffer reduces the average MPKI by 27.5%,narrowing the gap between LRU and OPT by 74.4%.With equal overall data lines,LRF outperforms other recent proposals including the V-Way cache,the dynamic insertion policy and the shepherd cache.
Keywords/Search Tags:cache performance, utilization characteristics, cache filtering, leaky sets, less reused lines
PDF Full Text Request
Related items