Font Size: a A A

Adaptive Cache Management Policies For High Performance Microprocessors

Posted on:2011-04-18Degree:DoctorType:Dissertation
Country:ChinaCandidate:X F SuiFull Text:PDF
GTID:1118330332469213Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
The access to main memory is one of the major constraints for the performance of microprocessor. The speed of memory access is usually two orders of magnitude slower than that of the processor. In order to narrow this gap, more than half of the on-chip transistors have been used to implement the last level cache in modern processor. However, the traditional management policy used in the management of small first level cache can not manage large last level cache effectively. Therefore, this leads to a large number of cache misses, resulting in frequent off-chip memory accesses and a serious decline of processor performance. Various factors, such as the increasing memory access latency, limited off-chip bandwidth, destructive inter-threads interference, diverse workload characteristics, increasing working set size of many emerging applications, and the reducing size of shared cache for each core due to increased number of cores on a single chip, have posed the design of cache a great challenge, and also make the importance of cache management policies become even more critical. This dissertation focuses on a number of hot spots of the last level cache management for high-performance microprocessor, especially for multicore processors, and proposes solutions with reasonable overhead to improve their performance. The main contributions of the dissertation include:1. Research on fair cache partitioning in simultaneous multithreading processor. The commonly used LRU policy implicitly partitions a shared cache on a demand basis, and the differences of temporal reuse behavior between diverse workloads result in their different abilities to compete for cache space, which uniformly impacts the rates of progress of all the co-scheduled threads. The issue of unfair cache sharing renders the operating system thread scheduler's ineffective, so thread starvation and priority inversion can arise. An adaptive runtime partition (ARP) mechanism is implemented to manage the shared cache. ARP takes fairness improvement as an optimization goal, employs the monitor circuits based on dynamic set sampling to collect the stack distance profile information periodically, and finally determines the optimal partitions using gready algorithm divided into both partition and rollback phases. The evaluation shows that on average, ARP improves the fairness of a 2-way SMT by a factor of 2.26, while increasing the throughput by 14.75%, compared to LRU. 2. Research on cache thrashing avoidance in multicore processor. As the associativity of last level cache increases, the performance gap between the least recently used and theoretical optimal replacement policies is enlarged. The reasons lie in LRU will cause cache thrashing for memory-intensive workloads that have a working set greater than the available cache size, and only recency information of cache access is considered in LRU, while block access frequencies are ignored. This dissertation shows that performance of memory-intensive workloads can be improved significantly by early evicting the dead lines and changing the position where the lines with less use frequencies are inserted or promoted. According to this idea, this dissertation proposes a shared cache management policy through eliminating dead lines and filtering less reused lines (ELF). ELF predicts the use frequencies of cache lines using live-time predictor. Then based on the prediction results, it can evict dead lines as early as possible, and reduce the time where less used lines stay in cache. Our evaluation on 4-way CMPs shows that ELF improves the weighted speedup by 14.5% on average over the LRU policy.3. Research on hybrid cache management policies in multicore processor. The traditional single cache management policies can't satisfy the performance requirements of different workloads having different locality characteristics. The cache performance of last level cache and further of the multicore processors may be degraded severely by LRU under the occurrence of the inter-thread interference or the excess of the working set size over the cache size. This dissertation proposes a unified cache management policy called Partitioning-Aware Eviction and Thread-aware Insertion/Promotion policy (PAE-TIP). PAE-TIP employs low-overhead set dueling mechanism to decide the position where to install the incoming lines or to move the hit lines, and chooses a victim line based on the target partitioning given by utility-based cache partitioning. It can enhance capacity management, adaptive insertion/promotion simultaneously, and tradeoff among many memory access behaviors with different locality characteristics. The evaluation conducted on 4-way CMPs shows that PAE-TIP can improve overall performance by19.3% on average over the LRU policy.
Keywords/Search Tags:simultaneous multithreading, multicore, last level cache, cache partitioning, cache thrashing, replacement policy, set dueling, dynamic set sampling
PDF Full Text Request
Related items