Font Size: a A A

Research On Key Technologies Of Cost-efficient Fault-tolerant Networks-on-chip

Posted on:2013-03-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y C ChenFull Text:PDF
GTID:1268330392473779Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
In future, the number of processors in a single chip will reach hundreds or eventhousands, and communication bandwidths between processors are very large.Traditional on-chip interconnection methods are unable to meet the communicationrequirements of multi-core chips, because of their poor scalabilities. As the feature sizeof COMS technology shrinking, gate delay are significantly reduced, and reductions ofwire delay are much lower than the gate delay, so wire delay are bigger than gate delay.Therefore, the global interconnection has to be designed carefully or discarded.Networks-on-Chip (NoCs) use local links instead of gloable interconnection wires, andhas good scalability to meet the needs of multi-core communication requirements.More than10%transistors may be faults because of technology variations or otherreasions. As the COMS technology feature size shrinks, wire widths become smaller,and the probability of hard faults is also increased. Hard faults may lead to NoCsparalysis. Existing NoCs are only able to tolerant small amount of faults, and theirfault-tolerant grains are coarse resulting in large area overheads. The design of alow-overhead fault-tolerant NoC has been a major challenge. In order to improve theability of toleranting hard faults of NoCs, this paper started from the implementation ofa hardware-level simulation and design platform, researched router architecture,fault-tolerant routing algorithm, fine-grain network architecture and task mappingalgorithm deeply, and achieved achievements as following:1. A hardware-level on-chip network simulation and design platform (i.e.HardSim) is designed. It is able to perform hardware-level simulation whichdescribes more hardware details than flit-level simulations. It combinessimulation and design verification in a unified flow. It supports simulations oftrace-based real applications. It implements two kinds of fault injectionpatterns, static fault injections and dynamic fault injections. Static faults areproduced and loaded into networks before the start of simulations. Dynamicfaults are dynamically produced and loaded into networks among simulations.2. A low-latency shared output buffered router, named SOBR, is presented.SOBR has5important features:(1) its virtual-channels locate in output ports,rather than input ports;(2) the dynamically configuration of access matrixes byvirtual-channel swapper is effective to improve the capacity of availablebuffers and the performance of networks;(3) its dynamic FIFO bufferarchitecture supports leap read operation to reduce packet blockings;(4) adynamic layered switching is taken to improve performances of networks;(5)all types of flits can pass through a router in one clock cycle ideally. Under65nm synthesized results show that its critical path is only24logic gates, and its worst delay is about0.64ns. Owing to its single-cycle pipeline, averagelatencies of SOBR are clearly lower than other routers. For4×4mesh and theuniform random traffic pattern, the maximum saturation throughput of SOBRis up to0.86flits per node per cycle. Owing to the elimination of VCallocation, switch allocation and switch modules, the area overhead of SOBRreduces up to9.4%when compared with the input virtual-channel router of thesame buffer. Qualitative analysis showed that the virtual-channel swapper iseffective to enhance the fault-tolerant ability of SOBR.3. A low-overhead distributed fault-tolerant routing algorithm, i.e. PR-WR, ispresented and integrated into SOBR router. It is based on the turn model of thewest-first routing, and takes a dynamic pseudo receiving (DPR) mechanism toenable or disenable west turns, and can ensure networks to be deadlock-free.The DPR mechanism refers to that local network interfaces receive west-turnpackets temporarily and then forward them to west ports as soon as possible.In order to store west-turn packets, each local network interface has a FIFObuffer queue for each west turns. DPR mechanism can turn off west turns toavoid deadlock. PR-WF chooses a suitable output port according to the state ofoutput links and neighbor routers by the specific principle of priorities. It is alogic-based distributed fault-tolerant routing, and its area overhead is muchlower than table-based fault-tolerant routings. It has nothing to do with thenetwork size, so it has good scalability. In order to avoid livelock, it onlyneeds to disable1.8%good links under10%link fault rate. For9×9mesh with10%faulty links, its average hop number only increases by8.34%than theshortest path. In summary, PR-WF is an efficient low-overhead distributedfault-tolerant routing.4. A low-overhead fault-tolerant NoC architecture, i.e SNoC, is presented. SNoCis base on channel slicing, and couples all slices by slice interfaces whichmake networks to tolerant faults in fine granularities. Each router has4slices,and each link has5sub-links. Its slice interface is self-reconfigurable toprovide optimal configurations according to states of slices and links. Its sliceuses SOBR architecture and PR-WF fault-tolerant routing. Simulation resultsshow that, SNoC is able to achieve good performance even under high faultyrates. Under65nm synthesized results show that its router critical pathsincrease only0.08ns than SOBR router. Compared with the network ofchannel slicing, its area overhead only increases about1%than the channelslicing network.5. A low-overhead fault-tolerant task mapping algorithm, i.e. CMP, is proposed.Existing task mapping algorithms are based on search method, which are time-consuming and have poor scalability. Construction algorithms graduallyconstruct sub-optimal solutions according to characteristics of optimalsolutions. It is of low complexities, and run faster than search algorithms.Therefore, a construction algorithm, named CMAP, is present for taskmapping problem. It is topology-awared to resolve task mapping problem forirregular topologies. It maps large weighted links of the task graph to singlehop routing as much as possible, or maps large degree node of the task graphto local optimal positions. Two real applications and a variety of task graphsare used to verify its accuracy, efficiency, scalability and fault-tolerant ability.
Keywords/Search Tags:Networks-on-Chips, Router, Hard faults, Hardware-levelsimulation, Virtual output queues, Fault-tolerant, Routing algorithms, Channelslicing, Task mapping
PDF Full Text Request
Related items