Font Size: a A A

Research On The Key Technology Of Fault Tolerance For Network-on-chip

Posted on:2016-08-10Degree:MasterType:Thesis
Country:ChinaCandidate:L WangFull Text:PDF
GTID:2348330536967609Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of communication-intensive multi-core chip,traditional bus cannot satisfy the need of immense communication.The Network-on-chip(NoC)posses high flexibility and scalability.It has become an efficient way to solve the communication problem of multi-core chips.However,with the increasing transistor density,it also brings about widespread reliability challenges.In this article,we mainly discuss the fault-tolerant strategies for the permanent faults on links and router components.First,we propose a runtime fault-tolerant routing algorithm based on region flooding to solve the permanent faults on NoC links.To provide runtime fault-detection and faultrecovery,we introduce a fault-tolerant MPI-like communication protocol.It detects the link failure if there exists unresponsive requests and automatically starts the new path exploration.The region flooding algorithm is used to search for a healthy path and reroute packets to avoid system stall.It confines the search within the minimal rectangle defined by source and destination nodes.Through directing a search following the minimal path to the destination,this algorithm dramatically reduces useless packets and improves network efficiency.More importantly,it only brings little loss of fault tolerance.Through experiment on booksim simulators,we conclude that our approach yields dramatic latency decrease compared with the basic flooding algorithm.The maximum latency gap is 25% under the bit complement traffic pattern.Also,we compare the fault tolerance of our algorithm with the basic flooding algorithm.The result shows our approach only brings 2% fault tolerance loss for the low fault rate(<4%)network.At last,we use RTL-Router and DC to evaluate area and power consumption of our design,which is 12% higher than the basic router.Second,aiming to tackle the permanent faults on the router components,we propose a high performance,high reliability and low cost router design based on a generic 2-stage router.Four fault tolerant strategies are added in our reliable router.We exploit a double routing strategy for the RC failure,a default winner strategy for the VA failure,a runtime arbiter selection strategy for the SA failure and a double bypass bus strategy for the crossbar failure.Different from previous reliable routers,our design leverages the feature of pipeline optimization and routing algorithm to maintain the performance in fault tolerance especially under heavy network loads.We compare our reliable router design with previous reliable router designs.The experiment result shows that our router design could obtain better performance especially with heavy network loads.We also evaluate area consumption and use SPF to compare the reliability of our proposed router with other existing fault tolerant NoC routers.The overhead of the corresponding circuitry in our router is 16% lower than Poluri's design and SPF is 44.7% higher than that one.These results reveal that our proposed router achieves higher reliability and lower hardware consumption.In sum up,we conduct our research to solve the reliability problem of NoC.First,we propose a runtime fault-tolerant routing algorithm based on region flooding to solve the permanent faults on NoC links.This algorithm yields dramatic performance increase through little loss of fault tolerance.Second,aiming to tackle the permanent faults on the router components,we propose a reliable router design based on a generic 2-stage router.It can tolerate multiple faults on pipeline units.It can also obtain high reliability,high performance and low cost.Our work is really meaningful for future engineering applications and theoretical study.
Keywords/Search Tags:Network-on-chip, fault tolerance, region flooding, high performance, double routing, default winner, runtime arbiter selection, double bypass bus
PDF Full Text Request
Related items