Font Size: a A A

Study On L2 Cache Replacement Arithmetic For CMP Architecture

Posted on:2009-12-17Degree:MasterType:Thesis
Country:ChinaCandidate:H T JiangFull Text:PDF
GTID:2178360272974093Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In the past 40 years, the speed between processor and memory access has a big gap. This results in "Memory wall" problem and has become increasingly serious, which becomes one of the most important bottlenecks in performance of the whole system. In modern computer architecture, Cache is used widely to alleviate this speed gap.In the typical Chip Multi-Processor architecture, L2 Cache is shared by multi-cores, which improves Cache utilization and avoids duplicating Cache hardware resources.Unfortunately,the branch mis-predictions of any processor core could lead the load miss from the wrong path to write some useless data into the shared L2 Cache,and cause L2 Cache pollution.This may increase additional Cache misses and impact performance of other cores for failing to occupy sufficient L2 Cache space,and allocate memory resources unfairly and even cause starvation.Therefore, Cache replacement algorithm is the key to efficiency. How to make the smallest possible cost, the increase Cache hit rate, the highest possible achieve processor performance, which becomes an important Cache research topic.Detailed study on Cache allocation of resources and shared Cache replacement strategy for CMP in this thesis. Based on detailed analysis of the pseudo LRU algorithm, we improved pseudo LRU algorithm and developed FPLRU replacement algorithm, we have designed and realized BIB used to record relevant information on the predict path, this will replace L2 Cache data from the wrong path as soon as possible, increase the use of the data locality. The benchmarks experiment shows that FPLRU compared with the pseudo LRU decreases misses rate more obviously.The shared L2 Cache can lead to increase the number of Cache misses and degrade performance due to resource contention in CMP structure when several threads run simultaneity. To investigate how the performance of a thread varies when running it concurrently with other threads on the remaining cores, we develop a Shared-Cache model to predict the number of misses on the shared L2 Cache. It mainly use circular sequence profiling and stack processing techniques to analyze the L2 Cache trace to predict the number of compulsory Cache misses, capacity Cache misses on shared data, and capacity Cache misses on private data, respectively. We perform experiments to validate the model and it shows that this model can accurately predict the L2 Cache misses. In addition, in this thesis we analyze SimpleScalar simulator and CMP-SIM simulator, and research on their Cache mechanism of simulation and implement, and then we carry out experiments using CMP-SIM simulator and benchmarks test programs. Finally, detailed assess and analyze the above algorithm.
Keywords/Search Tags:CMP, Cache replacement, FPLRU, Shared-Cache
PDF Full Text Request
Related items