Font Size: a A A

Smart Directory Cache For Multi-Many-Core Systems

Posted on:2015-07-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:L FangFull Text:PDF
GTID:1228330467979397Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Thanks to technology scaling, multiple processors can be integrated into one chip. The synchronization and communication between the processors is a key point in improving the effectiveness of chip-multiprocessors (CMPs). Shared-memory programming interface is a crucial element in exploiting the performance potential of CMPs, presenting brand new opportunities for holistic on-chip data and coherence solutions. The increasing number of processors makes straightforward snooping-based cache coherence less appropriate. Directory-based coherence has been the standard solution for large-scale shared-memory multiprocessors and is a clear candidate for on-chip coherence maintenance. However, the aggregate area of a vanilla directory cache grows quickly with the increasing of number of processors. In this paper, we exploit the fine-grain accessing behavior of applications and propose an expressive, area-efficient and effective on-chip memory subsystem for shared-memory CMPs to address three issues:1) the compression of directory entry size,2) the compression of directory entry number,3) the protocol of distributed data and metadata cache.Firstly, we propose a directory cache with hybrid representation to reduce the size of each directory entry. Traditional schemes allocate a full vector for each entry whose size grows linearly with the number of processors. We exploit the characteristics of the directory cache from a novel perspective and allocate different structures for different entries. Especially, we use the combination of single pointer and full vector since there are a lot of private data. The simulation result of64-way CMPs shows that the ratio of single pointer entries can be up to93.75%. The reduction of storage of directory cache is2.7X while the increment of execution time, on-chip network traffic and energy is less than0.6%. The schemes with compact structures cause2.5%of performance degradation while the storage of directory cache is reduced by1.7X.Secondly, we propose multi-granular tracking to reduce the number of directory entries. In the most basic incarnation, a directory entry is allocated for each cache line. Actually, the consequent memory blocks of a region may have the same access pattern and can be described by one region entry. The exceptional data in the region with different access pattern would be tracked by line entries. As a result, multi-granular tracking can reduce the number of entries while avoiding extra false sharing. The type and size of region entry is dynamically changed to match the access pattern of data region. The simulation result of64-way CMPs shows that the number of directory entries can be reduced by10X with multi-granular tracking while the performance degradation is about0.5%. To achieve the same ratio of compression, other schemes would cause7.5%of performance loss. The combination of hybrid representation and multi-granular tracking can reduce the storage of directory cache by22X while the execution time only increases by0.3%.Finally, we propose an effective metadata cache to improve both the data access and coherence maintenance by exploiting the fine-grain access pattern. The metadata tracks the coherence information and access pattern of the data. Instead of mechanical mapping, the data and metadata are delegated and replicated based on the access pattern. The data access and coherence activities are accelerated and the network traffic and energy are reduced by this way. The simulation result of64-way CMPs shows that, compared to shared cache schemes, the execution time, network traffic and energy of memory subsystem are reduced by10.5%,34.7%and23.7%respectively with4.7X less storage overhead. As a component of CMP, the metadata cache can improve the performance of data prefetching and the execution time is reduced by5.1%further.
Keywords/Search Tags:Chip multi-processor, shared memory subsystem, cache coherence, directory cache, metadata cache
PDF Full Text Request
Related items