Font Size: a A A

Research On Key Techniques Of High Performance Router Microarchitecture For Networks-on-Chip

Posted on:2012-05-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:S B QiFull Text:PDF
GTID:1118330362960512Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
With the technology scaling down, hundreds of processor cores are integrated into a single chip and global wire delays are in fact increasing while gates delays are scaling down. The conventional interconnect architectures such as bus, ad-hoc wire and crossbar, limited to the bandwidth, scalability, area and global wire delay can not satisfy the requirement of on-chip interconnect. NoC (Networks on Chip) are becoming the promising interconnects for the better scalability, predicted wire length and delay, high bandwidth and reusability. Scientific and commercial applications need on chip interconnects with the low latency and high throughput. The network has been studied widely in the parallel computers and internet. There are some differences between the NoC and those two types network. Router latency is the main component of the network latency in the NoC; NoC has more rich wire resource and less buffer capacity; NoC faces the very tight power and area budgets. The research of the thesis is based on these differences and focused on the low latency router architecture, pipelined physical link, low area overhead router, multicast and the low power design of the routers.The primary innovative works in this thesis are as follows:1. Adaptive CDB (Channel Double Buffer). CDBs are used to replace the link register implement the pipelined physical link. The ready-valid handshake flow control is adopted between CDBs and routers. The link adopt the local congest control and CDBs in the link can buffer Flits when the buffer in the downstream does not receive a Flit. This is equivalent to increasing the capacity of input buffer in the router. The delay model of CDB built based on the theory of logical effort displays that the delay of critical path is sensitive to the link width and the register overhead is main component of the critical path. The pipeline depth based on CDBs is sensitive to the wire type, wire length and clock cycle. Compared to the method of implement link pipeline by inserting register, the increase of pipeline depth based on CDBs is not obvious.2. DVOQR (Dynamic Virtual Output Queue Router) with dynamic buffer allocation based on CDB. The UDB (Unified Dynamic Buffer) read, routing computation and switch allocation can be perform parallel through VOQ (Virtual Output Queue), look-ahead routing computation strategy, dynamic buffer allocation and VOAQ(Virtual Output Address Queue), which can reduce the router pipeline to two stages. Dynamic buffer allocation strategy can efficiently use the limited on-chip buffer resources. DVOQR with buffer being one quarter of that in the virtual channel router has the same throughput with the virtual channel router in the random traffic. The delay model established based on the logical effort theory shows that the critical path delay is more sensitive to the number of ports. The synthetic workload simulation results display that throughput of DVOQR relative to wormhole router and virtual channel router is increased by 46.9% and 28.5% in the 4x4 Mesh network and random traffic; Throughput of DVOQR is still high 1.9% than that of virtual channel router with twice buffer capacity of DVQOR, and is almost the same with that of virtual channel router with four times buffer capacity of DVQOR under the same input speedup. Application workload simulation results show that the network average delay of DVOQR, wormhole router and virtual channel router relative to the ideal router is increased 6.6%, 50.9% and 94.6% respectively.3. BEA-BLESS (Based on Encoding Allocation BufferLESS router) with low area overhead. BEA-BLESS does not has input buffer and can reduce NoC to the chip area requirement. FBEA-BLESS is the Flit-switch router and PBEA-BLESS is the packet-switch router. Based on encoding allocation strategy adopted by the BEA-BLESS can reduce the critical path of router and increase the router work frequency. The frequency of BEA-BLESS is 2 times of the B-BLESS (Base BufferLESS router). The livelock can be avoided by the GoSS(Go-Stop-Steer) strategy. PBEA-BLESS router can use a small capacity buffer to eliminate the reordering buffer in the receiving end. The livelock and starvation can be avoided by improved GoSS strategy. Application workload simulation results show that the network average delay of BEA-BLESS relative to the B-BLESS is reduced by 29.4% and the capacity of buffer to support packet switching is only 33.3% of the capacity of the reorder buffer.4. Multicast router with load balance based on DVOQR. The network throughput model of multicast has been established, learning from the throughput model of unicast. BDOR (Balanced Dimension Order Routing algorithm) and MPDOR (Minimal Path Dimension Order Routing algorithm) proposed in the thesis are load balance routing algorithm. SM-DVOQR (Supporting Multicast DVOQR) and SMDL-DVOQR (Supporting Multicast Double Lane DVOQR) are based on DVOQR and can support multicast efficiently. SM-DVOQR is able to support XY/YX multicast routing algorithm. These two algorithms will result in the channel load imbalance between the X and Y directions. And the imbalance will increase with the network size increase. SMDL-DVOQR, which has two lane, is able to support BDOR and MPDOR multicast routing algorithms through one lane supporting XY multicast algorithm and the other lane supporting YX multicast algorithm. The simulation results under random multicast traffic display that the network performance will increase with the number of local output port, which is the optimal value of 2 and SMDL-DVOQR, which can balance the network load, can obtain the better network performance than SM-DVOQR.5. Leakage power optimization strategies for DVOQR. Power analysis results of DVOQR based on RTL-level display that UDB and VOAQ are the main components consuming the leakage power and occupy the 85% of the total leakage power; and the leakage power is a important component of total power consumption under the low network traffic. Adaptive buffer management and two-entry-never-turned-off are the two leakage power optimization strategies. Adaptive buffer management strategy can effectively reduce leakage power consumption. But the buffer wakeup delay will be attached to the network average delay at a lower network injection rate. The look-ahead wake up technology can fully hide the wake up delay with Twakeup=1. While, Two-entry-never-turned-off strategy can tolerate a larger wake up delay. The leakage power savings rate of Two-entry-never-turned-off strategy is less than that of adaptive buffer management strategy under the low network injection rate. But the leakage power savings rate of two strategies is almost the same under the middle or high network injection rate.
Keywords/Search Tags:Networks on Chip, Throughput, Latency, Routing Algorithm, Router, Bufferless Router, Multicast, Link, Leakage Power, Logic Effort, Load Balance, Channel Double Buffer, Baesd on Encoding Allocation
PDF Full Text Request
Related items