Research On Key Techniques For Optimizing Last Level Cache Performance

Posted on:2014-01-12

Degree:Doctor

Type:Dissertation

Country:China

Candidate:T Huang

Full Text:PDF

GTID:1228330392962191

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

Multiple level caches have been used in modern processors in order to bridge thewidening gap between processors and main memory speed. The design of last levelcache is different from the separate L1caches for instructions and data. The sharedlast level cache data are filtered by inner caches and have respectively weaker locality.Thus, the traditional management policy used in the management of small L1cache isdifficult to make effective utilization of last level cache capacity, which makes aserious impact on the processor memory access performance improvement. Theeffective management of last level cache can reduce cache misses and has importantpractical significance on the improvement of overall system performance.Operating system is responsible for allocating physical memory and sets up themapping between virtual and physical address. The data layout of last level cache canbe influenced by adjusting physical memory allocation policy in order to optimizedata locality and reduce last level cache misses. Compared with the traditionaloptimization techniques based on hardware and compiler, the above approaches havea lot of advantages including little changes on hardware and transparent toapplications. From the perspective of memory management policy and collaborativelast level cache design, we propose several new key techniques for optimizing lastlevel cache performance on the basis of related researches. The main contributionsand achievements of this dissertation are summarized into four aspects as follows:1. We propose a region-based software-only cache partitioning for reducing lastlevel cache pollution. If weak locality date enter last level cache, it could evict thefrequently accessed ones and cause last level cache pollution. A profiling feedbackmechanism is used to distinguish the weak locality pollute data regions in memoryintensive applications. This approach allocates a small last level cache slice formapping pollute data set by modifying physical page frame allocation policy ofoperating system. It can protect good locality data in last level cache and reduce lastlevel cache misses. The experimental results show that in comparison with theexisting Linux operating system, this approach reduces LLC MPKI, the last levelcache misses per1000instructions,15.23%on average and improves the applications performance7.01%on average.2. We propose a shared last level cache optimization by combiningprocess-based cache partitioning and pollute region isolation. The concurrentprocesses and the data regions of each process content the shared last level cachespace of multicore processor. It causes serious last level cache data access conflict.This approach detects and identifies the pollute region sets of a given application onthe different shared last level cache size configurations. It sets up a global pollutebuffer on shared last level cache to map the weak locality data regions of concurrentprocesses together and can further improve shared last level cache utilization onmulticore processors in multiprocessing environment. Our experimental results showthat in comparison with the existing Linux operating system and the process-basedcache partitioning RapidMRC, this approach respectively improves multicore systemperformance by26.31%and5.86%on average.3. We propose a page-grain last level cache insertion policy controlled bysoftware with a llightweight dardware support. Because of limited informationrecorded in it, the hardware-based last level cache management is difficult toeffectively exploit the behaviors of distinct data regions of a given application. Thisapproach designs page-grain last level cache software-controlled interface by usingthe reserved bits on page table entry of the existing processor. Guided by profilinginformation, it controls the last level cache insertion position of pollute regions data inpage grain. Compared with hardware-only insertion policy base on set dueling, it canfurther reduce last level cache pollution with little hardware overhead. Ourexperimental results show that in comparison with LRU, DIP and DRRIP, ourapproach can respectively reduces the MPKI,14.33%,9.68%and6.24%on average.;the processor performance is respectively improved8.3%,6.23%and4.24%onaverage.4. We propose a software and hardware collaborative last level cachemanagement for virtual address regions. The data of a continuous virtual addressregion are scattered over physical page frames during run time. The existing last levelcache performance monitor is difficult to make this distribution statistics and can notsupport for guiding runtime optimizations. Firstly, we design a region-basedperformance monitor in order to online record the last level cache memoryinformation of different data regions for a given application; Then, we also design alocality online profile with region-based performance monitor support; Lastly, wedesign a last level cache software-controlled interface. Guided by profiling information, operating system configures the bypass and insertion policies of differentdata regions according to their memory access behaviors. This apprach requires littleadditional hardware and can effectively improve last level cache utilization. Ourexperimental results show that in comparison with LRU, DIP and DRRIP, theprocessor performance is respectively improved8.05%、5.94%and4.01%on average.

Keywords/Search Tags:

last level cache, operating system, insertion policy, bypassing, software and hardware collaboration

PDF Full Text Request

Related items

1	High Performance Cache Replacement Policy For LLC
2	Research On Management Policy Of Shared Last Level Cache For Chip Multiprocessors
3	Research On Hardware/Software Co-Design Of MD32 Memory System
4	Research On Technologies Of Hardware/Software Collaboration For Reconfigurable High-productivity Computing System
5	Optimization Policy For Multi-level Cell STT-RAM Based Cache
6	The Improvement Of The Re-reference Interval Prediction Policy Of The Cache Replacement Policy
7	Adaptive Cache Management Policies For High Performance Microprocessors
8	Research On Multi-Level Cache Policy Based On VISA System
9	Design Of Multi-level Cache Replacement Strategy In Multi-Core Processors
10	Resource Allocation And Structure Design For Bottom Level Caches