Font Size: a A A

Resource Allocation And Structure Design For Bottom Level Caches

Posted on:2018-06-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:R T GuoFull Text:PDF
GTID:1368330566951359Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The multicore and manycore architectures have increasingly improved modern processors' performance.At the same time,they call for much higher data throughput.On the other hand,few breakthrough was witnessed in the recent development of DRAM technologies,which makes the “Memory Wall” problem more severe.The key lies on the advances of cache system.In order to narrow the performance gap between processor and memory,the capacity of the shared SRAM cache is significantly enlarged.This enables the SRAM cache cover a wider range of reuse distance.In addition,a new 3D DRAM cache layer is added in between the SRAM cache and main memory,offering higher bandwidth as well as larger capacity.These cache advances show great potential to hit the “Memory Wall”,but also introduce many challenges.Firstly,due to the filtering effect of the upper level caches,the data accesses with high temporal locality can not reach the bottom cache layers.This makes the bottom layers more prone to cache pollution.In addition,the bottom cache layers are commonly shared among cores,but the free contention can not lead to optimal cache resource allocations.Another challenge comes from the new cache materials.The DRAM cache has different device properties when compared with the SRAM cache.Neither the setassociative structure nor the direct-mapped structure can achieve optimal performance: the set-associative structure offers high hit rate,but suffers high hit latency;the direct-mapped structure guarantees low hit latency,but provides low hit rate.In this work,we propose three techniques to meet these challenges,including the cache-aware memory allocation scheme,the light-weight dynamic cache partition scheme,and the partial-direct-mapped DRAM cache structure.The cache-aware memory allocation scheme works at library layer.It provides general,transparent and low-overhead pollution control to applications.It extends the memory mapping into two types: restrictive-mapping and open-mapping.The restrictive-mapping is used for restricting the pollution effect of the poor locality data,while the open-mapping is used for cache friendly data.When a malloc request arrives,The system will predict the access locality of the to be allocated memory,determine the proper cache demand,and select the right mapping type for the malloc request.The design is based on the observation that data within the same memory chunk or chunks within the same allocation context often share similar locality property.The system embodies this observation by online monitoring current cache locality to predict future behavior and restricting potential cache polluters proactively.The experimental results show that the cache-aware memory allocator improves application performance by up to 45%,with an average monitoring overhead of 0.57%.The light-weight dynamic cache partition scheme works at operating system layer.It offers light-weight,transparent,and phase-adaptive cache partition services to multi-program workloads.The whole system is built on top of three key techniques,including the multilayer phase monitor,the fractal based miss rate curve monitor,and the on-demand page recoloring manager.The phase monitor keeps track of each process' s access behavior.When a process change its execution phase,the system will update its miss rate curve,and adjust the cache partition based on the new access pattern.The evaluation results show that the dynamic cache partition scheme is consistently better than the hardware based free contention and the static cache partition scheme.The average overhead is less than 2%.The partial-direct-mapped DRAM cache structure is applied at hardware layer.The design is motivated by the following observations: applying unified mapping policy in DRAM cache cannot achieve high hit rate and low hit latency in terms of mapping structure.The proposed scheme classifies data into leading blocks and following blocks,and places them with static mapping and dynamic mapping respectively in a unified cache structure.The design also includes a novel replacement policy to balance the miss penalty and the access hotness,and provides strategies to mitigate cache thrashing due to block type variations.Experimental results demonstrate that the partial-direct-mapped cache structure can achieve comparable hit rate and hit latency with set-associative cache and direct-mapped cache respectively.In summary,this work proposes three cache management techniques to meet the challenges of modern cache system,focusing on cache utilization as well as cache organization.Several key issues are studied,including on-line locality analysis,runtime cache allocation,and DRAM cache structure.
Keywords/Search Tags:Shared Cache, Locality Analysis, Cache Partition, Cache Organization, Replacement Policy
PDF Full Text Request
Related items