Font Size: a A A

Research On Key Techniques Of Bufferless Router For Network-on-chip

Posted on:2013-05-14Degree:DoctorType:Dissertation
Country:ChinaCandidate:C C FengFull Text:PDF
GTID:1268330392473853Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of microelectronic techniques, chip design enters intothemulticoreera. Duetotheincreasingnumberofcoresonasinglechip,communicationsbetween cores have become the performance bottleneck of the multicore System-on-Chip(SoC). Network-on-Chip (NoC) as an alternative to the classical bus or crossbar intercon-nection architecture has become a scalable and high-bandwidth communication paradig-m, which solves the global communication problem for the large scale multicore SoC andimproves the performance of the on-chip communication effectively. However, with theenhancementoftheintegrationdegree, powerconsumptionandareahavealreadybecomea limiting constraint in the design of multicore SoC. In addition, shrinking feature size,lower power voltage and higher frequency have a negative impact on the reliability ofNoC. Thus, energy-efficient, low-overhead and high reliable NoC is especially desirablefor the large scale multicore SoC.Bufferless router provides a low-overhead solution for NoC. In bufferless router, noadditional buffers are needed except the pipeline registers, which can reduce the powerconsumption and area overhead significantly and also simplify the design. The serializedswitch allocator in existing bufferless router limits the enhancement of the performance.Furthermore, the lack of reliability design in bufferless router makes it difficult to han-dle faults in the complicated situation. Thus, this dissertation investigates performanceoptimization and reliability design for the bufferless router microarchitecture. The maincontributions of this dissertation are as follows:1. Performance analysis for deflection routing and bufferless router based on a per-mutation networkThe thesis designs deflection routing algorithms for various NoC topologies andconducts the performance evaluations using different synthetic traffic patterns. The e-valuation results illustrate that the performance of deflection routing is susceptible tothe network topology and traffic pattern. The NoC architect should choose the suitableNoC topology for the specific application when designing bufferless NoC. For the univer-sal topology——2D Mesh NoC, the thesis proposes a1-cycle high-performance buffer-less router based on a permutation network (called BLESS_PERM). The BLESS_PERMrouter replaces the serialized switch allocator and crossbar with a simple2-level permuta- tionnetwork, whichcanreducethenumberoflogiclevelsonthecriticalpath, simplifythedesign complexity and enhance the clock frequency. Simulation results illustrate that theBLESS_PERMrouterachieves70%,65%,56%and41%lessaveragepacketlatencythanthe VC, BLESS_BASE, BLESS_PL and CHIPPER routers respectively under synthetictrafficworkloads, andachieves80%,72%,66%and38%lessaveragepacketlatencythanthose four routers respectively under real application workloads.2. Fault-tolerant architecture for bufferless routerForthereliabilitydesignofthebufferlessrouter,thethesisproposesacompletefault-tolerant architecture, which can detect and handle both transient and permanent faultylinks. The fault-tolerant architecture includes:An on-line fault detection mechanism using SECDED block coding, which can de-tect and distinguish transient faults from permanent faults without interfering withnormal packets transmission.A hybrid automatic repeat request (ARQ) and forward error correction (FEC) fault-tolerant flow-control scheme to handle transient faults occurring in packet on link-level.Two fault-tolerant deflection routing algorithms to route packets around permanentlinkfaultsonnetworklayer. TheFault-on-Neighbor(FoN)awaredeflectionroutingalgorithm, which can tolerate convex and concave fault regions without two con-cave points in sequence, makes routing decision based on the2-hop fault informa-tion transmission model and the fault region shape without deadlock and livelock.The reconfigurable fault-tolerant deflection routing algorithm (FTDR) based on re-inforcement learning, which can handle irregular fault regions, utilizes a reinforce-ment learning method to reconfigure the routing table to achieve fault-tolerance.A hierarchical-routing-table-based algorithm (FTDR-H) is also presented to reducethe area overhead of the FTDR router.Afault-tolerantdeflectionrouterwithreconfigurablebidirectionallinks(calledBiFT-DR). The BiFTDR router reconfigures the direction of the bidirectional links be-tween neighboring routers according to the link status and incoming packets infor-mation, which can handle unidirectional fault model without bypassing.3. High-performance and fault-tolerant deflection-routing-based multicast schemesThethesisproposesthreehigh-performancedeflection-routing-basedmulticast(DR- M) schemes. The DRM_noPR scheme is a simple multicast scheme, which selects theproductive direction based on the best candidate. The multicast packet will be routed toeachdestinationalongadynamicpathintheDRM_noPRscheme. TheDRM_PR_srcandDRM_PR_all schemes replicate multicast packets according to a region partition rule andthe busy or free status of the output ports, which can increase the diversity of the multicastpath and reduce the multicast latency. Furthermore, in order to improve the reliability ofthe multicast communication, the fault-tolerant DRM schemes (FT_DRM) are proposedbased on the three DRM schemes. FT_DRM schemes reconfigure the routing table basedon a reinforcement learning method and route multicast packets around permanent linkfaults without any packet lost. Experimental results show that in the network withoutfaulty links the DRM_PR_src scheme achieves18%less average packet latency than theDRM_noPR scheme, and the DRM_PR_all scheme achieves40%and27%less averagepacket latency than the DRM_noPR and DRM_PR_src schemes respectively. In the net-workwith5%and10%faultylinksoftotallinks, theDRM_PR_srcschemeachieves17%less average packet latency than the DRM_noPR scheme, and the DRM_PR_all schemeachieves38%less average packet latency than the DRM_noPR scheme.4. Bufferless router for3D NoCAs the bufferless router extends from2D to3D, the performance of the router de-grades with the serialized output port allocation further. The thesis proposes a1-cyclehigh-performance3D bufferless router based on a3-level permutation network (called3D_PERM). The3D_PERM router uses a3-level permutation network to replace the se-rialized switch allocator and a7×7crossbar, which can improve the performance andreduce the hardware overhead. Simulation results demonstrate that the3D_PERM routerachieves73%and14%lessaveragepacketlatencythanthe3D_BASEand3D_CHIPPERrouters respectively under synthetic traffic workloads, and achieves78%and14%lessaverage packet latency than the above two3D bufferless routers respectively under realapplication workloads. To address the low yield of the TSV manufacture technology in3D IC, the thesis proposes a low-overhead fault-tolerant deflection router (called FTDR-3D_OPT) for3D Mesh NoC. The FTDR-3D_OPT router uses a layer routing table andtwo TSV state vectors to make efficient routing decision to avoid both horizontal and ver-tical link faults. Synthesize results demonstrate that the area and power consumption ofthe FTDR-3D_OPT router are40%and49%less than those of a3D fault-tolerant deflec- tion router with a global routing table.
Keywords/Search Tags:Network-on-Chip, Bufferless Router, Deflection Routing, Permu-tation Network, Fault-Tolerance, Reinforcement Learning, Multicast, 3D Topology
PDF Full Text Request
Related items