Font Size: a A A

On-Chip Cache Management With Performance Monitoring Hardware Support

Posted on:2014-01-21Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y LiuFull Text:PDF
GTID:1228330398464252Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Utilizing on-chip cache resources efficiently is a critical issue in Chip Multiprocessor research. Software transparent feature is a main advantage of hardware cache memory, but also means unaware of program’s memory accessing behaviors and different requests from multiple threads. On one hand, it brings inter-thread cache interferences while multiple threads running on a multi-core system; existing cache management schemes could not ensure performance of each program and will lead to unpredictable cache contention and poor system throughputs. On the other hand, it results in caching inefficiency of running programs especially single-threaded programs because software could not control cache space allocation, wasting plenty of on-chip cache space.This dissertation will focus on three aspects of cache resources management, including information monitoring of running programs, cache contention management of multi-threads and cache space allocation in software manner. We implemented a scheme for monitoring programs running behavior with low cost; improved system throughputs and performance stability while running multiple threads; provided cache controlling measures for single-threaded programs execution. The major research contributions of this dissertation include:(1) Based on performance monitoring units that embedded in modern processors, a low cost performance monitoring tool named LWM is implemented. The underlying information of running programs could be accessed at user level with the help of LWM. Performance event records are added in each task structure; providing system calling interface for events configuration. Besides, performance-counter overflows and error counting situation are properly handled in context-switches. Events monitoring precision and performance counter utilization are improved through an optimized hardware counter multiplexing mechanism.(2) Proposed the memory load concept and designed memory load balance scheduling algorithm to improve system throughputs and performance stability of running programs. With reference to load balance scheduling in operating system, memory load balance scheduling algorithm is implemented at user level, and doesn’t require modifying operating system kernel space; therefore, it could be implemented as an auxiliary facility of process scheduling mechanism. Comparing with other scheduling algorithms, MLB algorithm has better performance in weighted speedup and system throughputs; reducing a large number of off-chip memory requests. More importantly, the MLB algorithm has good stability, reducing performance deviation between different runs. It offers the possibility for implementing a task scheduling algorithm with fairness and reliability features.(3) Designed a cache controlling mechanism named VSCP, improved caching efficiency of single-threaded program. VSCP unifies whole system cache space and provides programmers with cache space allocation interface. Physically distributed caches are virtualized as a block of centralized controllable cache. Instead of parallelizing single-threaded program to maximize computing resources, VSCP avoids reprogramming efforts with highly utilization of cache resources. Besides, it has power-saving advantage because it enables a single thread running in a period of time. We got some important understandings through cache management research:(1) In the background of increasingly serious situation of "memory wall", memory accessing performance is very important for a single program execution and whole system throughputs. Reducing cache miss rate is becoming more importantly than instruction counts decrease.(2) Existing cache management schemes, including task scheduling of operating system and cache replacement policy, could not get information of inter-thread cache contention, which results in inefficient cache management. Cache management schemes should be implemented in thread-aware manner; otherwise, it could not provide assurance such as performance, fairness and quality of service features.(3) Software and hardware co-design should be the best choice for solving cache resources contention problem. We need to design new interfaces between application runtimes and the cache management, create better performance monitoring infrastructures (both in hardware and in software) that will permit better "observability" of what is happening inside the system, as well as create better mechanisms for fine-grained resource allocation in hardware. Addressingthese problems will require inter-disciplinary effort of operating system designers, hardware architects and application developers.Cache management schemes proposed in our work are practical and implemented on real system. These solutions have general versatility and could be referred to future system architectures.
Keywords/Search Tags:chip multiprocessor, shared resources management, performancemonitoring, memory load balance, cache controlling
PDF Full Text Request
Related items