Font Size: a A A

Data placement optimizations for multilevel cache hierarchies

Posted on:2005-11-18Degree:Ph.DType:Dissertation
University:University of VirginiaCandidate:Coleman, Clark LFull Text:PDF
GTID:1458390008477979Subject:Computer Science
Abstract/Summary:
As compiler optimizations have increasingly focused on the memory hierarchy, a variety of efforts have attempted to reduce cache misses in first level instruction and data caches. Placement of code to reduce instruction cache misses, and placement of data to reduce data cache misses, have been demonstrated to be beneficial for a variety of application programs. However, most of this work has been limited to reduction of first-level cache misses. Careful examination of various characteristics of modern computer architectures reveals opportunities for a data placement optimization framework that targets several means of performance improvement at once. Cache hierarchies have recently extended as deep as three levels, each with different cache miss penalties. Cache misses need to be reduced at all cache levels to maximize performance. Reducing TLB (translation lookaside buffer) misses and virtual memory page use is also desirable. Addressing of global and local variables can use addressing modes of differing costs, and the less expensive addressing modes can be used more frequently if the data placement optimization considers this goal.; A multi-goal data placement framework has been developed to enable all of these optimizations. Through a novel method of static data affinity analysis, followed by a data placement optimization that uses hierarchical graph partitioning and local refinement, it is possible to achieve reductions in cache misses throughout the cache hierarchy, while also increasing page and TLB locality and enabling the address mode and bus cycle optimizations. An original method of characterizing the parameters of the cache and TLB hierarchy that are needed for the profiling and optimizations, using hardware performance counters, helps make the entire data placement framework practical and portable. The static data affinity analysis avoids the practical difficulties inherent in past research that relied on expensive dynamic profiling runs. The hierarchical graph partitioning approach to data placement is able to make use of Chaco, a well tested, off the shelf graph partitioning code library. Extensive measurements using timings and cache simulations for Sun UltraSparc-II machines demonstrate the effectiveness of the data placement optimizations.
Keywords/Search Tags:Cache, Data placement, Optimizations
Related items