Font Size: a A A

Cache Coherence Techniques For Chip Multiprocessor Architecture

Posted on:2014-02-10Degree:DoctorType:Dissertation
Country:ChinaCandidate:G M LiFull Text:PDF
GTID:1228330395958597Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Many modern computer systems and most multicore chips support shared memory in hardware. In a shared memory system, each of the processor cores may need private cache. In this case, the system needs a hardware cache coherence model to prevent incoherence among all private caches. Prior researches about cache coherence model mainly focus on multi-processors system. But for chip multiprocessor architecture, the techniques and hardware constraint are different, several problems arise while implementing cache coherence model on a single chip. Especially, a function expandsion of cache coherence to support hardware transactional memory and thread-level speculation bring some opportunities for simplifying parallel programming. So the research about the function expandsion and performance optimization of cache coherence model gains more and more attentions from academia and industry now.This dissertation focuses on the function expandsion and performance optimization of cache coherence model. By exploiting the potential advantages of the cache coherence model in function and performance, the transistor resources on a single chip can achieve fulluse; also the performance gains improvement on the basis of ensuring the scalability of the chip multiprocessor, which is benefit to make it suitable for the future multicore and manycore architecture. The following parts show the major research contributions in this dissertation.The first research in this dissertation focuses on the function expandsion of cache coherence model. The goal is to provide an expanded cache coherence model which can support hardware transactional memory and thread level speculation concurrently and efficiently. Hardware transactional memory and thread-level speculation can be implemented by expanding the function of cache coherence. Because of the similar hardware requirement, a unique hardware can be provided to support these two mechanisms simultaneously. But considering of the limited hardware resource and energy requirement in a single chip, and the increasing requirement for performance from most modern applications, one challenge is how to expand the function of cache coherence model efficiently. This dissertation focuses on providing unified hardware to support both transactional memory and thread level speculation mechanism efficiently. The major research contributions include:(1) we proposed TT-Dir, a hardware mechanism based on cache coherence model, which can support both transactional memory and thread level speculation.(2) Fast rollback mechanism is proposed in TT-Dir to handle slow abort problem in eager version management.(3) We proposed conflict-tolerant mechanism to tolerant write-after-write and write-after-read data dependences in thread level speculation mechanism. Also we proposed an efficient ordering mechanism for transactions. With this mechanism, conflict-tolerant mechanism can also be used by transactional memory system.The second research in this dissertation focuses on the optimization of cache coherence mode, which contains two aspects of performance and scalability. To keep pace with the ever-increasing throughput and performance requirement from most modern applications, the cache coherence models in commercial servers running these applications have to improve the performance. Besides, as the number of cores increase in a single chip, the scalability of directory coherence model becomes a new problem because of the hardware requirement of directory and dynamic energy. In this dissertation we proposed coherence model optimization on both performance and scalability. We first divided the implantation of cache coherence model into two levels:the network-on-chip (NoC) related level, and the behavior related level. Cache coherence model is explored and investigated in the views of these two levels. The major contributions include:(1) For the NoC related level, we took the advantage of the observation that coherence message occurs and proposed PPB (Phased-Priority Based) cache coherence model. PPB model decouples a coherence transaction and introduces the idea of "phase" message. The phase is considered as the priority of the message. Additionally, we also introduced the detailed implementation of PPB model, which includes the allocation of phase number and the priority-based arbitrators in on-chip network.(2) For the behavior related level, we proposed Loc-Dir coherence model, a hardware optimization on both performance and scalability. Loc-Dir adopts two-level directory organization to reduce the hardware cost of directory, which is benefit for the scalability of the whole system. Then it introduces prediction mechanism to implement the direct access among all private caches in a single chip, which can be used to solve the directory-based indirection problem. And in the implementation of this prediction mechanism, we proposed a new pattern-base predictor to scale the space of address that can be predicted. Finally we further optimized the Loc-Dir model based on the observation of sharing patterns in applications. Also we changed the policy of private cache replacement to make it suitable for our prediction mechanism.Based on the work of this dissertation, some important conclusions are drawn as following:(1) Cache coherence model can be used by parallel programming model to make the management of shared data easy.(2) The relationship of NoC and cache coherence mechanism is mutual coordination and restraint. The messages from cache coherence may increase the traffic of NoC, and the transferred speed of coherence messages may affect the performance of coherence model directly. So a novel interface is needed to implement the coordinated management for both NoC and cache coherence model.(3) For the directory-based coherence model, hierarchy combined with inclusion enables efficient scaling of the storage cost. Besides, the data sharing patterns can be used to improve the performance of cache coherence model efficiently.
Keywords/Search Tags:chip multiprocessors, cache coherence, transactional memory, threadlevel speculation, priority-based network on-chip, localized directory
PDF Full Text Request
Related items