Font Size: a A A

Research On Key Technology Of On-chip Multicasting Communication Based On2D Mesh Network

Posted on:2013-07-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:W M HuFull Text:PDF
GTID:1268330392473805Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of integrated circuit and motivated by the high perfor-mancecomputing,thedevelopmentofSystem-on-Chip(SoC)followsthetrendsasbelow:frommulti-coretomany-core,frombustonetwork,fromcentralizedstoragetodistributedmemory and from2D integration to3D integration. Network-on-Chip becomes a promis-inginterconnectionarchitectureformany-coresystemduetohighbandwidth,lowpowerconsumption and good extendability. The basic processing unit for large-scale parallelcomputing is changed from a processor to a core, and the parallelism of program is ef-fected greatly by the communication capability among the cores. Multicast is a typicalcommunication mode which is ubiquitous in large-scale parallel computing, and is a keyfactor that affects the parallel performance of chip multiprocessor(CMP). Traditionalunicast router only supports multicast at the software level, and it decreases the perfor-mance of CMP significantly. Hence, an efficient architecture of multicast router becomesan important research topic in on-chip communication.Bases on the platform of CMP of which the cores are connected by a2D mesh net-work, this dissertation focuses on the following points: the routing algorithm and thescheme of multicast, deadlock-free scheme of multicast routing and the acceleration ofmulticast communication. The detailed evaluation and analysis of them are also given.The major contributions can be summarized as follows:1)This dissertation proposes a two-period incremental multicast path setup approach(TPSS), which supports arbitrary shaped multicast tree, and can be fit to multiplerouting algorithms of multicast at basic-level. In the first period of TPSS, the unicastsetup packet is routed to a predetermined node before a new destination is assigned;Inthesecondperiod, theunicastsetuppacketisroutedtothenewdestination, andtherouting table is updated simultaneously according to the routing result. The wholemulticast tree is formed by multiple sub-paths of which each is setup by a unicastsetup packet. Bases on TPSS, this dissertation proposes three multicast routing al-gorithms:1) OPT, directed by the west-first turn model, seeks to find a bandwidth-efficient multicast tree. The algorithm optimizes the tree targeting at all the desti-nations globally, and a most power-efficient and band-efficient tree can be achievedby using the least links.2. LXYROPT, directed by the west-first turn model, is apartial optimized algorithm. It optimizes the multicast path located on the right sideof source node, and uses XY routing algorithm to generate the partial tree for the rest part. All the distances from the destinations to source are minimal, which keeps thegood performance with some saving on power consumption.3. CFG can achieve apower-efficient bandwidth-efficient tree under the restriction of the scale factor. Thescale factor restricts the maximum distance from the destinations to the source to ob-tain a different multicast tree, of which the performance and the power consumptionare between that of OPT and LXYROPT. All proposed algorithms are considered asoff-line algorithms, and can be executed at the compile time. They fit for the ap-plications that the communication modes are determinate which can be obtained bythe full system simulation. TPSS also supports the realtime setup of multicast treetargeting at the application that the communication modes are indeterminate. It justforbids the first period of the TPSS, and acts like XY multicast routing. The resultof synthesis shows the overhead of proposed scheme is negligible. The evaluationindicates that proposed algorithms can achieve different power consumption savingsand bandwidth savings according to the differing goals of applications.2)Bases on the scheme of incremental multicast path setup using multiple unicast setuppackets, this dissertation presents an approach of multicast path setup that incorpo-ratesevicting,andastrategyusingoldtreetobuildanewone. Inthemulticastroutingschemebasedonlookuptable,ifatableentrycanbereused,whichmeansoldtreecanbe replaced by a new one, it is obvious that some hardware overhead can be reduced.Thus, this dissertation proposes a multicast path setup approach that incorporates theevicting. This scheme selects the operation according to the condition that whetherthe setup packet is on the path belonging to the new multicast tree. If so, the routingresult will be added into the table entry. Otherwise, the table entry will be clearedand then be updated with the routing result. Compared with ID scheme, proposedone breaks the limitation of evicting times, and the hardware overhead to keep ID isnot necessary. Furthermore, this dissertation proposes a scheme that using old treeto build a new one, and the result of evaluation shows that it can reduce the setuplatency significantly when the reusing condition is met.3)Bases on the analysis of the traces extracted from the multithread programs runningon the CMP, the dissertation proposes a hybrid scheme to avoid deadlock caused bythe cross dependency in the scenarios that multiple multicast packets exist on NoCsimultaneously, and it also supports virtualization at the NoC level. This schemeadopts whole-packet buffering and asynchronous replication to avoid deadlock forthe small-sized packet, and centralized allocation for the large-sized packets to con- trol the number of multicast communication. The hardware overhead is negligible.Because multicast only occupies a small fraction of communication traffic during theparallel programs running on CMP based on directory coherence, and especially thelarge-sized multicast packets occupies even less, the result of experiment shows thatthe performance decreasing is negligible. The virtualization at NoC level needs toisolate the traffic belongs to differing sub-network, thus, the hybrid scheme allowsthe centralized allocator to control the number of multicast which belong to differingsub-network respectively. Hence, the number of concurrent multicast supported isincreased.4)To reduce the latency per hop in communication, this dissertation proposes self-selection pseudo-circuit (SP). SP improves the communication performance by by-passing switch arbitration (SA) according to the history information of SA. By an-alyzing the communication records of parallel program running on CMP, it can befound that some inports of the router in NoC are used frequently, and the corre-sponding outports vary alternately. Pseudo-circuit cannot be feasible for this case.However, SP allows multiple outports to be connected to one inport, and the routinginformationcanbeusedtoselectthedesiredoutporttoforwardthepacket. Tofurtherimprove pseudo-circuit reusability, we adopt a novel router named XHINoC whicheliminates the virtual channel. Hence, the matching of virtual channel is removedfrom the conditions of reusing pseudo-circuit. The conditions are simplified as thesame inport requires the subset of reserved outports. The result of experiment showsthe performance improved by SP is significantly. Furthermore, bit string encodingis used in the dissertation as an extension to XHINoC, which makes the routing of apacket to be finished in a cycle, and the performance of multicast is improved. An-other extension is using the single-flit packet, which reduces the demand of ID slots,and the hardware overhead is reduced.
Keywords/Search Tags:Multicast, HomogeneousMany-coreProcessors, Network-on-Chip, Distributed Shared Memory, Deadlock, Routing Algorithm
PDF Full Text Request
Related items