Design Of Dynamic Reconfigurable Cache Coherence Mechanism In Manycore Processor

Posted on:2017-09-07

Degree:Doctor

Type:Dissertation

Country:China

Candidate:X Han

Full Text:PDF

GTID:1368330590491062

Subject:Electronic Science and Technology

Abstract/Summary:

PDF Full Text Request

During the past several decades,the number of cores integrated in one processor is still increasing.Manycore becomes one trend for processor developing.The traditional interconnects,such BUS,could not meet the requirements of manycore processors.Network-on-chip(No C)becomes the widely used interconnects in manycore processors due to its high concurrency and throughputs.As unordered interconnects,No C brings challenges to cache coherence design,both in complexity and in hardware costs.Broadcasts in cache coherence protocols also restrict the performance of No C and inter-core communication,which is called coherence wall.In order to improve the performance of cache coherence,this thesis proposes researches in three domains related to cache coherence partitioning.1)Dynamic scalable cache coherence partition(SCCP).As the scale of manycore and NoC increases,cache coherence brings large quantities of broadcasts,which restricts the performance of inter-core communication dramatically.Existing researches focus on reducing hops of cache coherence,which increases the design complexity of cache coherence.For the speedups of running applications with more than sixteen cores are restricted,partitioning becomes one possible mechanism in improving cache performance.In this thesis,we propose a dynamic scalable partitioning for cache coherence.Simulation based on Gem5 simulator shows that SCCP improves the performance of cache system by 18.8% with 1.67% overhead in hardware costs,comparing with the 0.98% overheads of Token protocol.The overall speedup is 9% in average,and it has similar performance with Di Co protocol which has 3.30% overhead in on-chip storages.2)Dynamic subnetting mechanism for irregular topologies from scalable cache partitioning.Subnetting could reduce the penalties of broadcasts in NoC,which brings benefits to cache coherence.At the meanwhile,dynamic scalable cache coherence partition also brings irregular topologies.Traditional subnetting mechanism tries to use most fitted subnetting (MFS)topologies.MFS could reduce the quantities of broadcasts,but also reduce the available links in subnet,which could increase the average load on links.This thesis proposes a subnetting mechanism via covering logical subnet with physical subnet.Physical subnets are used to supply routing paths for logical subnets,and are formed by a serial of regular topologies.Simulation shows that proposed subnetting mechanism improves the performance of No C by 10% with 5~10% broadcasts.The hardware cost is low because only two bits are required in each router in No C.3)Reconfigurable cache system with message passing supporting(RMCC).In order to address the problem of coherence wall,message passing is introduced into manycore with 13% improvements in performance.However,as light-weight core,on which only one thread runs at the same time,is widely used in manycore,message passing buffers(MPBs)are wasted while computing.In order to improve the utilization of on-chip SRAM,this thesis proposes RMCC mechanism.In RMCC,on-chip SRAMs could be reused as either cache memory or MPBs.Comparing with separated MPB having 5.26% overhead in hardware costs,simulation shows that RMCC has 11.4% improvements in overall performance.At the meantime,RMCC without 5.26% overhead in SRAMs has the same performance the separated MPB mechanism.All in all,to improve the performance of data sharing and inter-core communication in manycore processors,this thesis proposes a scalable cache coherence partitioning in manycore,an efficient subnetting mechanism for irregular topology and the reconfigurable MPB with cache coherence.

Keywords/Search Tags:

Manycore, Cache, Coherence Protocol, Network-on-Chip, Dynamic reconfiguration, Subnetting, Broadcasting, Message passing, Deadlock-free routing algorithm

PDF Full Text Request

Related items

1	High Performance Network-on-Chip For Cache Coherence Optimization
2	Research And Implementation Of The Cache Coherence Protocol For The Large Scale System Of The SMP-based CC-NUMA Category
3	On-chip Network Routing Optimization For Multicore Cache Coherence
4	Dynamically reconfigurable on- and off-chip networks
5	Research On Deadlock-free On-chip Routing Algorithm
6	Research On High-Performance And Deadlock-Free Routing Algorithms In Mesh-Based Networks-on-chip
7	Optimizations Of Memory Subsystem For Chip Multiprocessor Systems
8	Research On The Key Techniques Of Routing Algorithm And Flow Control Optimizations For Cache-Coherent Networks-on-Chip
9	The Deadlock-free Routing Restrictions And Deadlock-free Routing Algorithms On Star Graph
10	Cache Coherence Techniques For Chip Multiprocessor Architecture