Font Size: a A A

Access Behavior Analysis And Optimization Of Caches For Chip Multi-processors

Posted on:2012-04-30Degree:DoctorType:Dissertation
Country:ChinaCandidate:X M JiaFull Text:PDF
GTID:1118330341451744Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
ChipMulti-Processors(CMPs)hasemergedasthemainstreamarchitectureofchoicein both marketplace and academia due to the fact that it is a more power efficient, scalableand cost-effective design alternative with lower design complexity when compared to s-ingle core design. One recent trend in large server systems and data centers is that serverconsolidation and virtual computing pick up steam, which promises multiprogrammedworkloads with diverse applications as the prevailing workloads for future CMP Plat-forms. Architects tend to adopt simple cores in CMPs, thus the main design challengefalls on the memory system. As the semiconductor industry heads into the nanometer era,on-chip cache hierarchy of CMPs has to cope with several new challenges, such as longoff-chip memory access latency, limited off-chip bandwidth, diverse workload character-istics, growing on-chip wire delay and destructive inter-thread interference. The designand management of on-chip cache hierarchy, especially the large non-first level cachesare becoming more critical to CMPs than ever.The access behavior characteristics experienced by a cache have very important im-pact on the efficacy of the cache as well as the system performance of CMPs. In viewof that, approaches for precise analysis of cache access behavior characteristics wouldbe a promising new way to govern and optimize cache hierarchy of CMPs. This thesisinvestigates approaches for cache access behavior analysis of large non-first level cachesof CMPs as well as the optimization mechanisms. The proposed mechanisms mainlytarget multiprogrammed workloads, but seek to boost the performance of multithreadedworkloads in the meantime. The main contributions are as follows:1. OABI, an online application cache behavior type identification approach for largenon-first level caches of CMPs is proposed, based on qualitative analysis and de-tailed simulation with SPEC CPU2006 benchmarks. OABI classifies applicationsinto five cache behavior types according to their varying trends of miss rate withcache size as well as their amount of misses, and then introduces necessary hard-waresupporttorealizetheidentificationofapplicationcachebehaviortypesonline.OABI is of great value to management and optimization of large on-chip non-firstlevel caches. 2. BIIP, an online application cache behavior identification-based replacement policyforlargesharedLast-LevelCache(LLC)isproposed, basedonourOABIapproach.BIIP combines online application cache behavior type identification with replace-ment policy selection. BIIP identifies the cache behavior type of each applicationthrough the use of OABI, and then assigns the most suitable cache insertion policyto the application according to its cache behavior type. Experimental results showthat BIIP can significantly improve cache space utilization of shared LLC, boostingthe overall system performance.3. BICS,anonlineapplicationcachebehavioridentification-basedmanagementmech-anism for large private LLCs is proposed on the basis of our OABI approach. BICSis based on a private cache organization for the shorter access latency and well per-formanceisolation. BICSidentifiescachebehaviortypesofapplicationswithOABIat runtime. When a cache block is evicted from a private LLC, cache behavior typeof the local application is evaluated so as to determine whether the block could bestored at remote peer LLCs. Those remotely stored blocks are allowed to replacesomevalidblocksofthepeerLLCsaslongastheinterferenceiswithinareasonablelevel. Meanwhile, BICS takes into account the non-uniformity of access latenciesto different private LLCs, seeking to reduce the average hit latency as much as pos-sible. ExperimentalresultsshowthatBICScanwellbalancethecapacityutilizationof private LLCs, and thus can improve the overall system performance.4. BP-NUCA, a cache set access pressure-aware management mechanism for largeprivate LLCs is proposed. BP-NUCA is also based on a private cache organizationfortheshorteraccesslatencyandwellperformanceisolation. BP-NUCAintroducesa low cost hardware structure to dynamically measure the access pressure exertedon each cache set at runtime. The access pressure information of cache sets is thenusedtotohelpmovesomeblocksofhighlyaccessedcachesetstolessutilizedcachesets of the peer LLCs that are with the same index address dynamically. BP-NUCAalso takes into account the non-uniformity of access latencies to different privateLLCs, trying to reduce the average hit latency as much as possible. Experimen-tal results show that BP-NUCA can effectively improve the capacity utilization ofprivate LLCs, leading to boosted overall system performance. 5. A thorough study is performed to help understand how non-uniform distribution ofmemory accesses on cache sets affects the system performance of CMPs. Severaloptimization mechanisms aimed to balance the memory access distribution acrosscache sets for shared, private non-first level caches of CMPs are proposed respec-tively. Evaluation of these mechanisms leads to the conclusion that on CMP plat-forms, thenon-uniformdistributionofmemoryaccessesacrosscachesetsispartial-ly circumvented by the interactions between multiple applications. Efforts seekingto make use of the non-uniformity to derive extra benefits may end up in vain inCMPs.
Keywords/Search Tags:ChipMulti-Processors, Non-FirstLevelCache, LastLevelCache, Application Cache Behavior Identification, Spilling, Non-Uniform Distribution ofMemory Accesses across Cache Sets
PDF Full Text Request
Related items