Font Size: a A A

Cache Partitioning Policies On Chip Multi-processors For Scientific Applications

Posted on:2010-02-18Degree:DoctorType:Dissertation
Country:ChinaCandidate:G SuoFull Text:PDF
GTID:1118360305473661Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With chip multi-processors (CMP) widely used in high performance computing, the performance optimization for scientific applications oriented to CMPs has become a hot topic. Nowadays most of the mainstream CMPs use the shared cache, but the interference between the executing threads simutantiously will reduce the performance. The cache partitioning mechanisms could partition the shared cache among individual applications to avoid the interference between them. Under multiprogrammed environment, the cache partitioning mechanisms can improve the overall performance, reduce miss rate and enhance fairness. However, for the scientific applications, whether the cache partitioning mechanisms could really improve the performance and how to improve the performance is still an open problem. Considering these problems mentioned above, this thesis, oriented to scientific applications, facilitates further study on the cache partitioning mechanisms on the CMPs with shared the cache. The innovations of this thesis are as follows:Firstly, the space contention cache model (SCCM) for the shared cache is proposed. Considering the shared cache using the LRU policy, SCCM can predict the occupation percentage of the cache space, miss rate and collision probability among the concurrent processes accessing the shared cache. The experimental results show that SCCM can provide better precision than Prob Model. Moreover, in this thesis, SCCM is also applied to analyze the performance parameters both in the ideal case and the real case respectively when MPI (Message Passing Interface) applications use the shared cache. In the ideal case, considering the 2-way shared cache, the performance parameters of the MPI applications with 2 processes using the shared cache is modeled. Besides, compared with using the shared cache directly, the sufficient conditions for the cache partitioning method under which it can achieve a low miss rate are given. While in the real case, SCCM is applied to analyze the occupation pecentage of the space, miss rate and collision probability between processes of the real MPI applications. Finally, we draw a conclusion that cache partitioning method can improve the performance of the MPI application.Secondly, considering the load balancd MPI applications, the spatial-level cache partitioning (SLCP) is proposed. According to the information collected by the Miss Rate Monitor (MRM), The cache partition problem finally comes down to a dynamic programming problem by means of the performance prediction model in SLCP, and in line with whether all the IPC curves of the concurrent processes are non-decreasing convex functions, the dynamic programming problem is solved under that two conditions. Considering that SLCP may cause the load imbalancing problem, time-level cache partitioning (TLCP) is proposed based on SLCP. TLCP will partition the cache in the time dimension, so as to insure the load balancing of the MPI application. The NPB test cases are executed to evaluating the SLCP and TLCP in detail. The test results show that for the load balanced MPI applications with large-scale working set, the SLCP and TLCP will achieve the same acceleration of the performance in most cases, as for a few cases in which SLCP may bring about the load imbalancing problem, the TLCP will make up for it and get more speedup of the performance.Thirdly, considering the multiprogrammed multithreaded workloads based on OpenMP, the weighted cache partitioning (WCP) is proposed. At present, all of the existing cache partitioning methods is used to deal with the multiprogrammed single-threaded workloads, but none of them have considered the difference among the numbers of the threads in different processes. Based on the traditional cache partitioning technology orienting lowering miss rate, The WCP takes into account the influence of the numbers of the threads on the shared cache partition, and gives a weight to the processes according to the number of its threads. The experimental result shows that although the miss rate is somewhat increased when the WCP is applied, the IPC throughput, the weighted speedup and the fairness are improved.Fourthly, considering the load imbalanced MPI applications, the cache partitioning based two level load balancing framework (CPTLLBF) is proposed. The CPTLLBF makes use of the load balancing information provided by the executing processes in real time, and perform load balancing operations dynamically (include partial load balance and global load balance). The partial load balancing operation is performed inside a CMP, which partitions the shared cache for the processes executed by the CMP, so as to achieve the load balance within a CMP. The global load balancing operation is performed in the whole parallel computer, which in line with the need of the processes for the shared cache, maps the processes to the different CMPs owned by the system dynamically, then launches the partial load balancing operations among the CMPs used by the MPI application to insure the load balance in the range of the whole system. The experimental result shows that the CPTLLBF could reduce the execution time of load imbalanced MPI applications effectively.
Keywords/Search Tags:Chip Multi-processor, Scientific Applications, Shared Cache, Cache Partitioning, Space Contention Cache Model, Spacial-level Cache Partitioning, Time-level Cache Partitioning, Weighted Cache Partitioning, Load Balancing
PDF Full Text Request
Related items