Optimizations Of Memory Subsystem For Chip Multiprocessor Systems

Posted on:2014-09-12

Degree:Doctor

Type:Dissertation

Country:China

Candidate:J H Li

Full Text:PDF

GTID:1268330425969857

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Modern chip multiprocessors (CMPs) employ large cache memories to reduce the performance gap between processors and off-chip memory. This thesis states that the particular characteristics of CMP system can be exploited to improve energy and performance in the memory hierarchy. The research presented in this thesis investigates several mechanisms to optimize the performance of CMP memory system. Specifically, we target three problems as our research topic:1) design efficient multicasting algorithm to improve the performance of on-chip network,2) exploit emerging non-volatile memories to design low power cache memory for CMP systems,3) exploit thread progress information to design high performance cache coherence protocols.For the first research topic, we propose an efficient multicast routing mech-anism for on-chip network. For CMP system with increasing core count, on-chip network provides an efficient and scalable interconnection paradigm, wherein one-to-many (multicast) communication is universal for such platforms. Without ef-ficient multicasting support, traditional unicasting on-chip networks will be low efficiency in tackling such multicast communication. In this thesis, we propose dual partitioning multicasting (DPM) which significantly reduces packet laten-cy and on-chip network power dissipation. Specifically, DPM scheme adaptively makes routing decision based on the network load-balance level as well as the link sharing patterns characterized by the distribution of the multicasting destinations.For our second research topic, we propose to exploit emerging non-volatile memory, such as spin-torque transfer RAM (STT-RAM), to design low power cache memories. STT-RAM has fast read access, high storage density and negli-gible leakage power. However, the wide adoption of STT-RAM as cache memories is impeded by its long write latency and high write power. The write performance of STT-RAM can be improved through relaxing the retention time of its cell, magnetic tunnel junction (MTJ). The resultant volatile STT-RAM needs to be periodically refreshed to prevent data loss. When applied as the large last-level cache in CMP systems, the frequent refresh operations could dissipate significant extra energy. In addition, the refreshes could severely conflict with the normal read/write operations to degrade the overall system performance. In this thesis, we propose cache coherence enabled adaptive refresh (CCear) to minimize the number of refresh operations for volatile STT-RAM. CCear can effectively mini-mize the number of refresh operations on volatile STT-RAM through interacting with cache coherence protocols and cache management policy.Finally, we propose an efficient coherence adaption mechanism to improve the performance of cache coherence protocol in CMP systems. One primary ob-jective of CMP system is to boost application execution by exploiting thread-level parallelism. In such systems, threads typically exhibit unbalanced progress stem-ming from unequal cache misses or task assignment. Load imbalance is one of the biggest roadblocks for parallel application performance. Because of the inherent synchronization primitives, such as barriers and locks, cores running fast thread have to waste pervious cycles waiting for slow cores. In this thesis, we propose thread progress aware coherence adaption (TEACA) which utilizes the thread progress information as the hints to adapt hybrid coherence protocols. Specifical-1y, TEACA fuses the memory system statistics to estimate the progress of threads. Based on the estimated thread progress information, TEACA dynamically catego-rizes threads into leader threads and laggard threads. The thread categorization decisions are then leveraged for efficient coherence adaption in hybrid coherence protocols.

Keywords/Search Tags:

Chip Multiprocess, On-Chip Networks, Cache Coherence, STT-RAM, Multicast Routing, Cache Memory, Network Partition, Hybrid Cache

PDF Full Text Request

Related items

1	Cache Coherence Techniques For Chip Multiprocessor Architecture
2	On-chip Network Routing Optimization For Multicore Cache Coherence
3	High Performance Network-on-Chip For Cache Coherence Optimization
4	Research On The Key Techniques Of Routing Algorithm And Flow Control Optimizations For Cache-Coherent Networks-on-Chip
5	Smart Directory Cache For Multi-Many-Core Systems
6	Research On Energy Optimization Method For On-chip Cache Memory Subsystem
7	Key Research Issues Of Memory Architecture For Three Dimensional Multi-Core Processors
8	Analysis And Implementation Of Cache Coherence Protocols For CMP
9	Research On Cache Coherence Protocols Based On Data Sharing Characteristics
10	Rcsarch And Design Of Cache Coherence For Mu11i-core Processors