Font Size: a A A

Research On Key Techniques For Optimizing Last Level Cache Performance

Posted on:2014-01-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:T HuangFull Text:PDF
GTID:1228330392962191Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Multiple level caches have been used in modern processors in order to bridge thewidening gap between processors and main memory speed. The design of last levelcache is different from the separate L1caches for instructions and data. The sharedlast level cache data are filtered by inner caches and have respectively weaker locality.Thus, the traditional management policy used in the management of small L1cache isdifficult to make effective utilization of last level cache capacity, which makes aserious impact on the processor memory access performance improvement. Theeffective management of last level cache can reduce cache misses and has importantpractical significance on the improvement of overall system performance.Operating system is responsible for allocating physical memory and sets up themapping between virtual and physical address. The data layout of last level cache canbe influenced by adjusting physical memory allocation policy in order to optimizedata locality and reduce last level cache misses. Compared with the traditionaloptimization techniques based on hardware and compiler, the above approaches havea lot of advantages including little changes on hardware and transparent toapplications. From the perspective of memory management policy and collaborativelast level cache design, we propose several new key techniques for optimizing lastlevel cache performance on the basis of related researches. The main contributionsand achievements of this dissertation are summarized into four aspects as follows:1. We propose a region-based software-only cache partitioning for reducing lastlevel cache pollution. If weak locality date enter last level cache, it could evict thefrequently accessed ones and cause last level cache pollution. A profiling feedbackmechanism is used to distinguish the weak locality pollute data regions in memoryintensive applications. This approach allocates a small last level cache slice formapping pollute data set by modifying physical page frame allocation policy ofoperating system. It can protect good locality data in last level cache and reduce lastlevel cache misses. The experimental results show that in comparison with theexisting Linux operating system, this approach reduces LLC MPKI, the last levelcache misses per1000instructions,15.23%on average and improves the applications performance7.01%on average.2. We propose a shared last level cache optimization by combiningprocess-based cache partitioning and pollute region isolation. The concurrentprocesses and the data regions of each process content the shared last level cachespace of multicore processor. It causes serious last level cache data access conflict.This approach detects and identifies the pollute region sets of a given application onthe different shared last level cache size configurations. It sets up a global pollutebuffer on shared last level cache to map the weak locality data regions of concurrentprocesses together and can further improve shared last level cache utilization onmulticore processors in multiprocessing environment. Our experimental results showthat in comparison with the existing Linux operating system and the process-basedcache partitioning RapidMRC, this approach respectively improves multicore systemperformance by26.31%and5.86%on average.3. We propose a page-grain last level cache insertion policy controlled bysoftware with a llightweight dardware support. Because of limited informationrecorded in it, the hardware-based last level cache management is difficult toeffectively exploit the behaviors of distinct data regions of a given application. Thisapproach designs page-grain last level cache software-controlled interface by usingthe reserved bits on page table entry of the existing processor. Guided by profilinginformation, it controls the last level cache insertion position of pollute regions data inpage grain. Compared with hardware-only insertion policy base on set dueling, it canfurther reduce last level cache pollution with little hardware overhead. Ourexperimental results show that in comparison with LRU, DIP and DRRIP, ourapproach can respectively reduces the MPKI,14.33%,9.68%and6.24%on average.;the processor performance is respectively improved8.3%,6.23%and4.24%onaverage.4. We propose a software and hardware collaborative last level cachemanagement for virtual address regions. The data of a continuous virtual addressregion are scattered over physical page frames during run time. The existing last levelcache performance monitor is difficult to make this distribution statistics and can notsupport for guiding runtime optimizations. Firstly, we design a region-basedperformance monitor in order to online record the last level cache memoryinformation of different data regions for a given application; Then, we also design alocality online profile with region-based performance monitor support; Lastly, wedesign a last level cache software-controlled interface. Guided by profiling information, operating system configures the bypass and insertion policies of differentdata regions according to their memory access behaviors. This apprach requires littleadditional hardware and can effectively improve last level cache utilization. Ourexperimental results show that in comparison with LRU, DIP and DRRIP, theprocessor performance is respectively improved8.05%、5.94%and4.01%on average.
Keywords/Search Tags:last level cache, operating system, insertion policy, bypassing, software and hardware collaboration
PDF Full Text Request
Related items