On-Chip Cache Management With Performance Monitoring Hardware Support

Posted on:2014-01-21

Degree:Doctor

Type:Dissertation

Country:China

Candidate:Y Liu

Full Text:PDF

GTID:1228330398464252

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

Utilizing on-chip cache resources efficiently is a critical issue in Chip Multiprocessor research. Software transparent feature is a main advantage of hardware cache memory, but also means unaware of program’s memory accessing behaviors and different requests from multiple threads. On one hand, it brings inter-thread cache interferences while multiple threads running on a multi-core system; existing cache management schemes could not ensure performance of each program and will lead to unpredictable cache contention and poor system throughputs. On the other hand, it results in caching inefficiency of running programs especially single-threaded programs because software could not control cache space allocation, wasting plenty of on-chip cache space.This dissertation will focus on three aspects of cache resources management, including information monitoring of running programs, cache contention management of multi-threads and cache space allocation in software manner. We implemented a scheme for monitoring programs running behavior with low cost; improved system throughputs and performance stability while running multiple threads; provided cache controlling measures for single-threaded programs execution. The major research contributions of this dissertation include:(1) Based on performance monitoring units that embedded in modern processors, a low cost performance monitoring tool named LWM is implemented. The underlying information of running programs could be accessed at user level with the help of LWM. Performance event records are added in each task structure; providing system calling interface for events configuration. Besides, performance-counter overflows and error counting situation are properly handled in context-switches. Events monitoring precision and performance counter utilization are improved through an optimized hardware counter multiplexing mechanism.(2) Proposed the memory load concept and designed memory load balance scheduling algorithm to improve system throughputs and performance stability of running programs. With reference to load balance scheduling in operating system, memory load balance scheduling algorithm is implemented at user level, and doesn’t require modifying operating system kernel space; therefore, it could be implemented as an auxiliary facility of process scheduling mechanism. Comparing with other scheduling algorithms, MLB algorithm has better performance in weighted speedup and system throughputs; reducing a large number of off-chip memory requests. More importantly, the MLB algorithm has good stability, reducing performance deviation between different runs. It offers the possibility for implementing a task scheduling algorithm with fairness and reliability features.(3) Designed a cache controlling mechanism named VSCP, improved caching efficiency of single-threaded program. VSCP unifies whole system cache space and provides programmers with cache space allocation interface. Physically distributed caches are virtualized as a block of centralized controllable cache. Instead of parallelizing single-threaded program to maximize computing resources, VSCP avoids reprogramming efforts with highly utilization of cache resources. Besides, it has power-saving advantage because it enables a single thread running in a period of time. We got some important understandings through cache management research:(1) In the background of increasingly serious situation of "memory wall", memory accessing performance is very important for a single program execution and whole system throughputs. Reducing cache miss rate is becoming more importantly than instruction counts decrease.(2) Existing cache management schemes, including task scheduling of operating system and cache replacement policy, could not get information of inter-thread cache contention, which results in inefficient cache management. Cache management schemes should be implemented in thread-aware manner; otherwise, it could not provide assurance such as performance, fairness and quality of service features.(3) Software and hardware co-design should be the best choice for solving cache resources contention problem. We need to design new interfaces between application runtimes and the cache management, create better performance monitoring infrastructures (both in hardware and in software) that will permit better "observability" of what is happening inside the system, as well as create better mechanisms for fine-grained resource allocation in hardware. Addressingthese problems will require inter-disciplinary effort of operating system designers, hardware architects and application developers.Cache management schemes proposed in our work are practical and implemented on real system. These solutions have general versatility and could be referred to future system architectures.

Keywords/Search Tags:

chip multiprocessor, shared resources management, performancemonitoring, memory load balance, cache controlling

PDF Full Text Request

Related items

1	Research On Management Policy Of Shared Last Level Cache For Chip Multiprocessors
2	Optimizations Of Memory Subsystem For Chip Multiprocessor Systems
3	Research On Shared Last-level Cache Management Policy For CPU-GPU Heterogeneous Multiprocessor Architecture
4	Cache Coherence Techniques For Chip Multiprocessor Architecture
5	Research And Implementation Of The Cache Coherence Protocol For The Large Scale System Of The SMP-based CC-NUMA Category
6	Modeling Of Shared Cache Memory Access Behavior Based On Artificial Neural Network
7	Modeling Shared Cache Memory Accesses Of Multi-core Processors
8	Assessment of cache coherence protocols in shared-memory multiprocessors
9	A Study On Cache Coherent Of CC-NUMA Multiprocessor System
10	Research On Shared Cache Access Fairness For Many-Core Processor