Font Size: a A A

Research On High-efficiency On-chip Routing Architecture And Its Optimization Techniques For Many-core Communication

Posted on:2018-07-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:X T TangFull Text:PDF
GTID:1368330569498494Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
As technology advances rapidly and the application requirement changes,microprocessor has been driven strikingly into multi-core and many-core era,where the traditional on-chip communication architecture such as shared bus,point-to-point interconnection,crossbar switch etc.is confronted with great challenges in its scalability,performance,area,power consumption and reliability.These challenges are from the rising scale of inter-core communication,the tightening constraints of area and power,and the increasing of fault probability.In order to overcome these inherent limitations of traditional on-chip communication to accommodate the evolving environment and communication requirement,NoC technology has been widely studied for its high scalability,low latency,natural GALS clock property,high bandwidth,notable reusability,etc.And it has become an outstanding paradigm for inter-core communication schemes in the current on-chip manycore microprocessors.The NoC however faces severe challenge in performance,area and reliability,resulted from further evolvement of technology and application environment.This subject is carried out around NoC's High-efficiency On-chip Routing architecture and Optimization Techniques,against above challenges faced by NoC in the many-core environment,aiming to provide certain technology support of inter-core communication for the further investigation of many-core microprocessor.The main contributions of this paper are presented as follows:(1)A backhaul-route pre-configuration mechanism to optimize Round-trip Communication is proposedCache coherence protocol is usually adopted to maintain the consistency and integrity of shared data in the design of multi-core and many-core processors.In the processing of cache coherence protocol,packets in the application communication would present a round-trip communication pattern with a certain probability.To our best knowledge,there is no related research work at home and abroad about the Specialized optimization for the response packets traversal under the round-trip communication.According to the communication pattern,the paper explores the characteristics of round-trip communication pattern presented in the application communication,and proposes a backhaul-route pre-configuration mechanism(i.e.,BRPCM)to optimize the response packet transmission.The basic idea of BRPCM is to pre-configure a converse crossbar connection(i.e.,backhaul-route)within a single router during the previous request packets traversal,which is suited for its subsequent response packet traversal.Combining with corresponding routing algorithm,virtual channel distribution management,route reuse and termination mechanism,the subsequent response packets and even other packets satisfied with the comparative conditions are expected to reuse the backhaul-route and directly forward to crossbar without SA pipeline stage,and hence to bypass some pipeline stages(SA),as well as accelerate packets traversal.The experimental results of synthesized workload and the real application's trace workload communication show that BRPCM is superior in transmission delay and throughput to the existing architecture of router on chip.(2)A Hotspot-Route pre-configuration mechanism to optimize Communication performance of temporal-spatial locality is proposedCommunication packets transmission will present a certain temporal-spatial locality communication characteristics.The traditional route pre-configuration method presents a Pseudo Circuit scheme to accelerate packets traversal based on Communication temporal locality.However,the improvement of the communication performance mainly depends on the probability of temporal locality in communications.Moreover,when the injection rate is high,which induces the decrease of effective rate and limits the potential of performance increase.To solve this problem,we study the spatial locality communication characteristics in dimension ordered routing mode,and propose a straight-forward route pre-configuration mechanism(i.e.,SFRPM)which provides optimizations on communication spatial locality.Based on SFRPM,fusing the pre-configuration and reuse feature of pseudo-circuit and straight-forward route,we propose a hotspot-route pre-configuration mechanism(i.e.,HRPCM)which is appropriate for temporal-spatial locality in communications.HRPCM mechanism could switch among pseudo-circuit and straight-forward route dynamically with the support of some priority strategy according to the real-time status of crossbar in router and the utilization status of virtual channels in downstream router.In this way,we could improve the effective rate and reusability of pre-configuration routes in the router,which decreases the average latency for packets traversal further.The experimental results of synthesized workload and the real application's trace workload communication show that HRPCM has obvious improvement on decreasing latency of packet traversal and increasing network throughput compared with traditional on-chip router architecture.(3)A Fine-grained Fault-tolerance Routing algorithm based on fault-port loopback transmission is proposedSince the inherent feature of the tightly coupled relationship between VC and I/O datapath,and the feature that the VC of each input port can only serve specific output port in VOQ buffer strategy,the communication performance and the efficiency of faulttolerance of VOQ router are highly sensitive to the fault in the VOQ node and the I/O datapath.Traditional coarse-grained or fine-grained fault-tolerant strategy can neither effectively adapt to the fine-grained fault-tolerance requirements on VC and I/O datapath based on the VOQ buffer,which leads to the low utilization of resources and limited faulttolerance.To solve these problems,the paper pays attention to the fault tolerant design of on-chip routing architecture based on VOQ buffer strategy.Firstly,we build a fine-grained network fault model to effectively refine node fault into VC fault and datapath fault in the network.Based on the first work,we further propose a fine-grained fault-tolerance routing algorithm based on fault-port loopback transmission,named FFR_FPLT,which uses the normal VC and datapath resources that are abandoned by the input link fault,to tolerance partial datapath faults occurs in the node,and to improve the utilization network resources and the performance of fault-tolerance.The experimental results of synthesized workload and the real application's trace workload communication show that,compared to the traditional fault-tolerance routing algorithm,the performance of FFR_FPLT is superior in the high fault-rate network,and the performance of the probability of forwarding the packet to its optimal output port,throughput,average latency and average number of hops can significantly improve with tiny amounts of hardware overhead.(4)Unidirectional Mesh network topology faced to the NoC of low overhead is proposedDesign complexity,area and power overhead have be the major limiting factor of NoC topology for the scalable multi-core and many-core processor systems.To overcome the disadvantage of power and area overhead,this paper proposes an unidirectional Mesh network topology(i.e.,UniMESH)faced to the low overhead NoC,to achieve the low overhead and low complexity network topology.Compared to the traditional 2D-Mesh topology,UniMESH simplifies the router architecture,uses only half amount of channel links to guarantee a fully connected topology,and adopts a novel routing algorithm and deadlock recovery mechanism to maintain the network performance.The experimental results show that UniMESH can effectively reduces both design complexity,area-cost,and decreases some unwanted power consumption remarkably compare with the Ring topology.Moreover,the proposed UniMESH can reduce 57.4% router areas and save39.3% total power consumption.It is noteworthy that only 4.5 cycles latency is added compared with conventional 2D-mesh topology.
Keywords/Search Tags:Many-core Communication, Network-on-Chip, Route Pre-Configuration, Temporal-Spatial Locality, Loopback Transmission, Fault-Tolerance Routing, Low-Cost, Deadlock Recovery
PDF Full Text Request
Related items