Font Size: a A A

Dynamic Cache Partitioning Under CMP Structure

Posted on:2012-09-24Degree:MasterType:Thesis
Country:ChinaCandidate:Z WangFull Text:PDF
GTID:2178330335450292Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
By the restriction of power dissipation and complexity of chip-designing, the traditional Superscalar Processor can't meet the people's need. At the same time, as the development of computer hardware technology, the Chip Multi-Processor System is becoming a hot topic. As the increasing number of cores which can be integrated on a single chip, the huge difference in speed between processors and memory system has been fully revealed, and the Cache pollution is becoming another thorny issues. Since the number of parallel executing threads is getting more and more, while, the standard cache replacement policy, LRU, doesn't distinct accesses from different processors. A miss from one application can replace a cache block of another application. Then, the conflicting accesses between concurrent processes decrease the overall performance. In addition, the execution time may become unpredictable due to the conflicting misses, as the execution time of one application depends on the other concurrent applications.In order to resolve the problem of cache pollution, some research focus on shared cache partitioning, cache coherence and other aspects. Shared cache partitioning are used to divide last level shared cache into some independent cache sets, so each process has its own cache set. By this way, cache pollution can be avoided by restricting cache accesses. Besides, according to the needs of different applications, different sizes of cache sets are allocated to applications.Some current representative of cache partitioning methods include processor performance based dynamic cache partitioning, fairness based dynamic cache partitioning and quality of service based dynamic cache partitioning. This method are starting from different points of view to do cache partitioning, respectively, to achieve reducing the overall miss rate of shared cache, getting the maximum system throughput, ensuring the fairness of competing applications and optimizing the quality of service. For this method, most used monitoring circuit or miss rate monitor to monitor the access conditions of shared cache. They used cache sets to simulate the state of competing applications shared cache, then calculating the miss rate of each application by stack distance profiling. Utility based cache partitioning algorithm, IPC based cache partitioning algorithm and other algorithms use greedy algorithm, they assign cache ways (one by one) to the application for N-ways associative cache. This paper proposes a throughput (IPC) and fairness based shared cache partitioning algorithm which in order to find a balance between fairness and throughput. We use the function of FairIpc= N/(?)(SingleIPCi/IPCi) as the objective function to purchase the optimizing throughput on the basis of system fairness. We use the cache access monitor to collect the missing information of each thread, and use counters to collect the hitting and missing numbers, then use stack distance profiling to calculate the miss rate of each application when it holds different size of cache. After that, we use the relationship between miss rate and IPC to get the IPC of each thread. By this way, we can calculate how to partition the cache.After analyzing the miss-rate of memory access-intensive programs, we got that the concavity of miss-rate function is unpredictable, so the concavity of cache partitioning gain function is unpredictable. Then, we get the conclusion that the traditional greedy algorithm is not the optimal algorithm. We decided to use the irrevocable control strategy (Hill-Climbing) instead of backtracking and heuristic graph-search control strategy, considering that the hardware overhead and time overhead was huge. Therefore, we suggest the look-ahead algorithm which can be proved better than greedy algorithm.For the implementation of cache partitioning, we set the partition module as the loadable module which can be called in the running process. We made some modification to the Least Recently Used Replacement Algorithm. To implement way partition, we add an array for each core to identify the cache ways that belong to the core. When need replacement, only the cache ways that belong to the core can be replaced.Experiments show that dynamical shared cache partitioning is a useful way to resolve the problem of cache pollution. It can get higher performance on system throughput and weighted speedup. The improvements on system throughput are 24.08%(UCP,15.40%(FCP,22.26% (I-F CP), while the improvements on weighted speedup are 23.01%(UCP),15.34%(FCP)and 21.18%(I-FCP).Although I-F CP may decrease the system throughput, it can apparently improve the system fairness, comparing with utility based cache partitioning algorithm. The improvement will come to 1.67 times (up to several times). Compared with fairness based cache partitioning algorithm, I-F CP will improve the throughput by 8.06%, while improve the weighted speedup by 5.70%. At last, we suggest some new research direction including using priority of threads, purchase new synthesized dimension, using the number of threads for multi-channel multi-threads applications.
Keywords/Search Tags:Shared Cache, Dynamic Partitioning, IPC, Fairness
PDF Full Text Request
Related items