Font Size: a A A

Research On The Key Techniques Of Networks-on-Chip Architecture Design And Performance Analysis

Posted on:2016-12-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:B L LiFull Text:PDF
GTID:1108330509961038Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of semiconductor technology, the number of processing elements integrated on a single chip increases significantly. As a basic communication fabric, Networks-on-Chip(No C) becomes more and more popular for both academic community and industry. Communication performance and hardware cost are two key criteria for the availability of No C, and how to design a network to meet the strict communication requirement of on-chip applications with acceptable hardware cost is the key issue of this area. In this thesis, we perform in-depth research on the hardware design and performance evaluation of No C. The main contribution of this thesis is summarized as follows.(1) Adaptive Virtual Channel Sharing Router Microarchitecture for No CThe performance and hardware cost of wormhole-switched No C router heavily depend on the organization and management scheme of input buffer. In order to achieve high performance without introducing significant power and area burden, the management of buffer resources should adapt the dynamical change of traffic load. Nowdays,nearly all the existing solutions try to share buffer space among different input ports or Virtual Channels(VCs) of the same port to improve the buffer utilization, ignoring the actual buffer requirement of different ports.In this thesis, we first analyze the disadvantage of conventional router and propose an inter-port Adaptive VC Sharing(AVCS) router microarchitecture, which adjusts the VC number and buffer capacity each input port can use to improve the buffer utilization and network performance. First, we analyzed the shortcoming of conventional router architecture, and proposed the framework of AVCS. In this architecture, each input port contributes a small portion of buffer resource to form the shared buffer. Then, we proposed an algorithm to allocate the shared buffer resource for each input port on demand. Finally,we introduce the idea of port sharing while implementing the VC Allocator(VCA) and SWitch Allocator(SWA), which utilizes a smaller VCA and SWA to meet the demand of large-scale VCs and switch allocation. Based on a detailed RTL implementation, we evaluate the performance and hardware overhead of our proposal. Experimental results show that AVCS saves 32.1% power and 11.7% chip area when compared with the typical router architecture with similar performance.(2) Worst-case Delay Analysis for Wormhole-switched NoCTo deploy real-time applications on wormhole-switched No C, the worst-case delay constraint of each flow must be analyzed and guaranteed. Researchers have proposed Flow Level Analysis(FLA), Link Level Analysis(LLA) and Deterministic Network Calculus(DNC) based analysis methods to meet this requirement. Whereas, both FLA and LLA can only be applied to analyze the delay bound of No C with infinite buffer size.Although the DNC method is able to analyze No C with limited buffer space, the delay bound obtained by DNC method can be further improved.In this thesis, we build an Real-Time Calculus(RTC) based performance model for the priority-aware wormhole-switched No C to overcome the limitation of FLA and LLA,and improve the delay bound obtained by DNC. The performance model consists of traffic model and service model. First, to derive the traffic model, we propose a theorem to transform the flit-level RTC arrival curve to the packet-level RTC arrival curve. Then,we construct a basic RTC service model for wormhole router with infinite buffer capacity and a theorem to derive the RTC service curve provided by the flow controller. Finally,we propose a delay analysis algorithm based on the constructed performance model. Our delay analysis algorithm outperforms FLA and LLA due to its ability to analyze the No C with priority sharing and limited buffer size. In addition, our delay analysis algorithm can give tighter delay bound than the DNC method, because it takes the maximum service capability of routers and minimum arrival rate of each flow into consideration. The tightness and correctness of our algorithm are verified by comparing with simulation results.(3) Buffer Sizing Algorithm for Wormhole-switched No CNowadays, researchers have proposed VC sharing and LLA-based buffer sizing algorithm to reduce the hardware cost of priority-aware wormhole-switched No C. The VC sharing scheme reduces the total number of VCs by sharing VC resources among multiple flows. However, this scheme will lead to performance degradation and even deadlock under some routing polices. The LLA-based method keeps the VC number unchanged and tries to reduce the VC depth. However, the allocation results can be further improved since it refers to the minimum buffer size that does not trigger the flow control.In this thesis, we proposed an RTC based buffer sizing algorithm to further reduce the hardware cost of wormhole-switched No C. This algorithm tries to reduce the buffer resource reserved along the path of each flow under the deadline constraint in priority order. First, we give a sufficient condition to determine the minimum buffer size large enough to avoid flow control, which can be applied to derive the initial value of the entire iterative buffer sizing algorithm. Second, we present a buffer reduction procedure based on DNC theory to reduce the buffer resources reserved for each flow along its transmission path. Finally, we proposed a buffer sizing algorithm based on the work done previously to reduce the hardware cost of No C router under the deadline constraint. Compared with the existing methods, our algorithm can reduce the buffer requirement of each flow significantly.(4) Route Selection Algorithm and Its Verification MethodIn wormhole-switched No C, end-to-end delay of each flow heavily depends on its transmission path and the traffic characterization of all the flows. In order to optimize the delay bound of each flow, we must take the lower priority flows into consideration while selecting path for higher priority flows, and select a minimum interference path for each flow. After the transmission path has been determined, we also need an effective way to verify whether the deadline constraint of each flow can be satisfied.In this thesis, we propose a route selection algorithm for the mesh topology to to optimize the end-to-end delay of each flow. This algorithm tries to find the minimum interference path for each flow based on Dijkstra algorithm. The weight of each link describe the importance of each link, which is determined according to some basic properties of mesh topology and conclusion of Combinatoric Mathematics, which significantly reduces the computation complexity and storage overhead of the previously proposed recursive method. After determined the transmission path, we then invoke the proposed delay analysis algorithm to verify the deadline constraint of each flow. We also proposed several simplification strategies to accelerate the execution of entire algorithm. Experimental results show that, the proposed route selection algorithm can reduce the worst-case delay of each flow significantly.In summary, this thesis investigates several important issues in the research of No C performance analysis and hardware improvement, which is of great value to the implementation and application of NoC.
Keywords/Search Tags:Networks-on-Chip, Warmhole-Switching, Real-Time Communication, Real-Time Calculus, Buffer Sizing, End-to-End Delay Analysis, Virtual Channel Sharing
PDF Full Text Request
Related items