Font Size: a A A

Research On Management Policy Of Shared Last Level Cache For Chip Multiprocessors

Posted on:2012-08-01Degree:MasterType:Thesis
Country:ChinaCandidate:W YinFull Text:PDF
GTID:2178330338492049Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the development of high performance microprocessor, the access to main memory becomes a key constraint for the performance of processor. The speed of processor excesses that of memory access by two orders of magnitude. The architecture of processors with large capacity shared last level Cache (LLC) is most commonly used in multi-core processor design to narrow this gap. But the traditional management policy which is fit for small capacity cache does not perform well in managing the LLC. It triggers large numbers of LLC miss, leading to expensive off chip memory access and a serious decline of processor performance. There are two main reasons for this phenomenon. Firstly, small sized private capacity cache emphasizes on the access speed, but the LLC retained as much data as possible on chip, it is constrained by the number of available transistors on chip and do not has very high demand for access speed. Secondly, the visible locality of the two different types of caches has significant difference. Therefore, an efficient management policy for the LLC is very important for high performance microprocessors. This paper focuses on a lot of hot research point of the last level cache for microprocessor and proposes corresponding efficient solutions with reasonable cost to improve the whole system performance. The main contributions and innovations include:1. With the development of multi-core technology, the architecture of CMP processors with large capacity shared last level Cache (LLC) is most commonly used in multi-core processor design. But the LRU management policy which is traditionally used to manage LLC resources seriously constrains the performance of CMP processors. An effective way to solve the problem is to evict low utility data blocks as early as possible to reduce the work set of the workloads. In this way, high utility data blocks can stay in the LLC longer than before. Therefore the processor can achieve higher cache hit rate and performance. In this paper, we propose an Eliminating Less Reused Blocks and Re-Reference Interval Prediction Policy (ELRRIP) to solve the problem. ELRRIP can get the data reference information of its previous level cache. Then it predicts the low utility data block according to the information and evict them as early as possible. ELRRIP uses improved Re-Reference Interval Prediction Policy to predict potential low utility data block and tries its best to evict them as early as possible. We also proposed TADELRRIP to manage LLC more efficiently. Our evaluation on 4-way CMPs shows that TADELRRIP improves the overall performance by 9.14% on average over the LRU policy, and the performance benefit of TADELRRIP improves 4.54% compared to TADIP and 3.56% compared to TADRRIP.2. Modern Chip Multiprocessors (CMPs) contain multiple cores in a single chip and these cores share last-level cache (LLC). When applications with different memory access behaviors compete for the shared LLC, conventional Least Recently Used (LRU) management policy leads to performance degradation. Applications with different memory, access behaviors compete for the shared LLC in different ways, and many researchers have proposed various techniques to improve the performance of the entire CMPs. In this work, we propose a new cache replacement policy (we name it the TADC) that eliminates the side effects brought by streaming applications and judiciously allocate precious LLC resources to those applications that can benefit from additional cache ways. This new policy equally divides each cache set into several subsets whose number is equal to the number of applications running on the CMPs and maps each subset to each application. It detects the memory access behaviors of different applications in different intervals. And it determines different insertion and promotion policies in different subsets according to their owners'memory access behaviors in the last interval. This new policy can also support inter-core capacity stealing. The proposed TADC improves the total Instruction Per Cycle (IPC) throughput as much as 24.3% and 5.94% (on average 7.48% and 3.00%) over the baseline LRU policy for Dual-Core workloads and Quad-Core workloads respectively.
Keywords/Search Tags:multiprocessor, last level cache, LRU policy, cache access behavior, re-reference prediction, cache miss, memory access, cache management policy
PDF Full Text Request
Related items