Font Size: a A A

Research On Many-core On-chip Network And Memory Hierarchy

Posted on:2011-10-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:X WangFull Text:PDF
GTID:1118360305453464Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
With the development of VLSI technology, more and more computing resources can be integrated on a single chip. Accordingly, microprocessor has evolved from single-core to multi-core system. As a result, multi-core, or even many-core, processors has become the main stream products on the market. The development of multi-core parallel processor forces people to pay attention to two new research topics: the "On-Chip-Network (i.e. NoC) System" and the "Programming Model" for Multi-core Processor System. On a multi-core processor, on-chip storage units and computation units are connected by on-chip network. The responsibility assumed by On-Chip Network is to correctly transport instructions, operands, cache & memory blocks, synchronization messages, state messages, and control messages to their destination with minimal delay. Different transportation patterns must be used based on the different transportation requirements. As the largest on-chip component of the multi-core processor system, the performance of the On-Chip-Network would directly affect the performance, power efficiency, reliability, and cost of the whole multi-core processor system.Currently, there are many different architecture designs for multi-core processor. Different architecture designs demand different on-chip communication requirements. For example, the communication patterns for "shared-memory" system and "distributed-memory" system are quite different. Each routing algorithm and network topology has its own worst-case traffic pattern. Therefore, it is necessary to analyze the characters of the communication pattern on the multi-core processor chip before making the decision on the final design of the architecture of on-chip network. No matter in what parallel architecture, for a given on-chip interconnection design, "high throughput", "short delay", "low power consumption" and "die area" are the most important concerns of the designer. In addition, the complexity of the network is also a very important factor due to the restriction of the modern design flows and CAD tools. Three most important issues in the design of an on-chip network are: Topology, Routing Algorithm, and Flow control mechanism. The topology of the on-chip network directly decides how the on-chip resources are organized and the maximal throughput and minimal delay of the network. And the purpose of routing algorithm and flow-control are to best utilize the potential of the on-chip network.In this thesis, we analyze the architecture of different multi-core processor system, study the character of their communication pattern, evaluate the workload of network channel, average latency, and load-balance of the network, and propose a set of rules that can be used in the design of on-chip network of multi-core processor. In the thesis, we found that, on a multi-core processor with shared-memory design, its communication pattern is largely decided by the layout of the storage units and the synchronization policy. Based on this observation, we propose Global-Feedback Flow Control, Message Classification, and Shared Level-2 Cache Absorption to solve the network congestions caused by burst access. By using these methods, the average latency of the point-to-point communication during the period of burst access is greatly reduced.For many-core processor system with "distributed-shared-memory", we propose to extend the current OpenMP parallel programming model to improve network transportation efficiency. The parallel programming model designed for many-core processor provides an easier way for programmers to manage on-chip resources. It makes the on-chip data transportation more efficient and thus improves the performance of the many-core processor system. According to our knowledge, the most widely accepted parallel programming solution for many-core processor is OpenMP. However, all the current OpenMP directives are only used to decompose computation code (such as loop iterations, tasks, code sections, etc.). None of them can be used to control data movement, which is crucial for the performance of the programs running on many-core processors with software-managed memory hierarchy. In Chapter 5, we propose a new technology called "tile percolation". This technology provides the programmer with a set of new OpenMP pragma directives. The programmer can use these directives to annotate their program to specify "where" and "how" to perform data movement. The compiler will then generate the required code accordingly. Our method is a semi-automatic code generation approach intended to simplify a programmer's work. Chapter 2 introduces some fundamental concepts of the on-chip network and on-chip memory in the design of many-core processor; Chapter 3 presents the analysis of the on-chip storage and on-chip network system designed for the Godson-T many-core processor. It also describes the characters of the on-chip data transportation of these "tile-based" and "shared-memory-based" many-core processor. Chapter 4 discusses the "Global-Feedback Flow-Control" mechanism designed for the many-core processor. Chapter 5 illustrates the design and implementation of "tile percolation", an OpenMP based parallel programming model for many-core processors with software-managed memory hierarchy system. Chapter 6 makes the conclusion of the thesis and outline the design of the future works.
Keywords/Search Tags:Many-core, On-chip Network, OpenMP programming API
PDF Full Text Request
Related items