Font Size: a A A

Cross-Layered Application Layer Optimization Framework For Networks-on-Chips

Posted on:2012-05-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:X H WangFull Text:PDF
GTID:1118330371456291Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the continuous scaling of CMOS technologies, the number of integrated transistors in a chip grows exponentially. When the CMOS technology scales below 65nm, the wire delay does not decrease although the feature size keeps shrinking. The chip performance cannot improve alone with the increase in the number of transistors and clock frequency. With the growing number of transistors integrated in a chip, multiple processor cores, memory units or other intellectual property core (IP core) could be integrated into a single chip and connected by networks-on-chip (NoC). NoC design rises several challenges, including lower energy consumption, high bandwidth/ low transmission latency, scalability, and reliability. To meet the challenges in a systematic manner, we propose a layered interactive building block (LIB) methodology. Focusing on the application layer of the LIB methodology, we solve three outstanding problems in NoC design:application mapping, multicasting, and message dependent deadlock avoidance.This thesis first proposes an application communication behavior-aware and topology-aware application mapping algorithm for 2D NoCs. The mapping algorithm is named as template-aware efficient mapping (TEM). TEM classify the applications into two types according to their communication trace graphs:1) application with communication hot nodes and 2) application without communication hot nodes. TEM maps the hot nodes and their neighbors to tiles with minimum hop count distance for the first type application. TEM partitions both the NoCs and the application communication trace graphs in the mapping process for the second type applications. TEM can be used in 2D mesh, torus, and butterfly fat tree topologes. The result from TEM can also be used as the initial population of a genetic algorithm (GA) to get a further optimized algorithm, GA+TEM. By evaluating the communication traces from SPLASH-2 benchmarks on the Noxim NoC simulator, GA+TEM can reduce communication energy by 5%~20% compared to GA only.This thesis also proposes a runtime incremental mapping heuristic for 3D NoC. The mapping heuristic is named as energy efficient run-time incremental mapping framework (ERIM). ERIM classifies the application into two types:1) communication centric and 2) computation centric. For communication centric applications, ERIM utilizes the increased degree in the vertical direction to reduce communication energy consumption. For computation centric applications, ERIM balances the temperature of the processors running tasks to avoid thermal violation. The experiment results confirmed that the mapping result from ERIM can reduce energy consumption by 15% compared to two other greedy based heuristics.Next, this thesis researches on multicasting in irregular regions in a NoC system when multiple applications are allocated to the same NoC system. This thesis proposes an irregular region oriented multicasting strategy with the following idea. Based on an existing multicasting algorithm, e.g. multicasting XY, when the output channel of a node is connected to another node which is not in the same region, an alternative direction is selected. Based on this strategy, a 2D region oriented alternative multicasting XY routing algorithm (AL+XY) is proposed. The experimental results confirm that AL+XY can reduce both communication energy consumption and average latency. When the injection rate is 0.4 flit/cycle and the multicasting to unicasting ratio is 0.3, the communication energy consumption values of multiple unicasting and a region based broadcasting are 2.2x and 2x over that of AL+XY. At the same multicasting to unicasting rate and injection rate, the average latency of multiple unicasting and a region based broadcasting are 11x and 1.3x over that of AL+XY. AL+XY could be extended to be AL+XYZ which supports region based 3D NoC multicasting. AL+XY and AL+XYZ routers are synthesized with TSMC 65nm library and can work at 800 MHz. The area of an AL+XY router increases by 3% compared to a 2D unicasting router while the area of an AL+XYZ router increases by 7% compared to a 3D unicasting router.Finally, this thesis proposes a request-request type message dependent deadlock avoidance method in peer-to-peer streaming systems. The cause of message dependent deadlock is that, the messages cannot be consumed by the destinations and are stored in the network such that the inter-dependency of the messages causes deadlock. This thesis proves in theory a sufficient condition to avoid request-request type message dependent deadlock by increasing non-uniform virtual channels (e.g. the numbers of virtual channels at each input port of routers are different). Based on this theory, this thesis proves that finding the minimal number of non-uniform virtual channel is an NP-complete problem and thus an integer linear programming based algorithm is proposed. The algorithm is named as path selection and minimum virtual channel allocation (PSMV). PSMV can be integrated with existing mapping algorithms to generate deadlock free mapping result. The result from PSMV has low latency and low additional buffer cost.
Keywords/Search Tags:networks-on-chips, application mapping, multicasting, deadlock avoidance
PDF Full Text Request
Related items