Research On Many-core On-chip Network And Memory Hierarchy

Posted on:2011-10-03

Degree:Doctor

Type:Dissertation

Country:China

Candidate:X Wang

Full Text:PDF

GTID:1118360305453464

Subject:Circuits and Systems

Abstract/Summary:

PDF Full Text Request

With the development of VLSI technology, more and more computing resources can be integrated on a single chip. Accordingly, microprocessor has evolved from single-core to multi-core system. As a result, multi-core, or even many-core, processors has become the main stream products on the market. The development of multi-core parallel processor forces people to pay attention to two new research topics: the "On-Chip-Network (i.e. NoC) System" and the "Programming Model" for Multi-core Processor System. On a multi-core processor, on-chip storage units and computation units are connected by on-chip network. The responsibility assumed by On-Chip Network is to correctly transport instructions, operands, cache & memory blocks, synchronization messages, state messages, and control messages to their destination with minimal delay. Different transportation patterns must be used based on the different transportation requirements. As the largest on-chip component of the multi-core processor system, the performance of the On-Chip-Network would directly affect the performance, power efficiency, reliability, and cost of the whole multi-core processor system.Currently, there are many different architecture designs for multi-core processor. Different architecture designs demand different on-chip communication requirements. For example, the communication patterns for "shared-memory" system and "distributed-memory" system are quite different. Each routing algorithm and network topology has its own worst-case traffic pattern. Therefore, it is necessary to analyze the characters of the communication pattern on the multi-core processor chip before making the decision on the final design of the architecture of on-chip network. No matter in what parallel architecture, for a given on-chip interconnection design, "high throughput", "short delay", "low power consumption" and "die area" are the most important concerns of the designer. In addition, the complexity of the network is also a very important factor due to the restriction of the modern design flows and CAD tools. Three most important issues in the design of an on-chip network are: Topology, Routing Algorithm, and Flow control mechanism. The topology of the on-chip network directly decides how the on-chip resources are organized and the maximal throughput and minimal delay of the network. And the purpose of routing algorithm and flow-control are to best utilize the potential of the on-chip network.In this thesis, we analyze the architecture of different multi-core processor system, study the character of their communication pattern, evaluate the workload of network channel, average latency, and load-balance of the network, and propose a set of rules that can be used in the design of on-chip network of multi-core processor. In the thesis, we found that, on a multi-core processor with shared-memory design, its communication pattern is largely decided by the layout of the storage units and the synchronization policy. Based on this observation, we propose Global-Feedback Flow Control, Message Classification, and Shared Level-2 Cache Absorption to solve the network congestions caused by burst access. By using these methods, the average latency of the point-to-point communication during the period of burst access is greatly reduced.For many-core processor system with "distributed-shared-memory", we propose to extend the current OpenMP parallel programming model to improve network transportation efficiency. The parallel programming model designed for many-core processor provides an easier way for programmers to manage on-chip resources. It makes the on-chip data transportation more efficient and thus improves the performance of the many-core processor system. According to our knowledge, the most widely accepted parallel programming solution for many-core processor is OpenMP. However, all the current OpenMP directives are only used to decompose computation code (such as loop iterations, tasks, code sections, etc.). None of them can be used to control data movement, which is crucial for the performance of the programs running on many-core processors with software-managed memory hierarchy. In Chapter 5, we propose a new technology called "tile percolation". This technology provides the programmer with a set of new OpenMP pragma directives. The programmer can use these directives to annotate their program to specify "where" and "how" to perform data movement. The compiler will then generate the required code accordingly. Our method is a semi-automatic code generation approach intended to simplify a programmer's work. Chapter 2 introduces some fundamental concepts of the on-chip network and on-chip memory in the design of many-core processor; Chapter 3 presents the analysis of the on-chip storage and on-chip network system designed for the Godson-T many-core processor. It also describes the characters of the on-chip data transportation of these "tile-based" and "shared-memory-based" many-core processor. Chapter 4 discusses the "Global-Feedback Flow-Control" mechanism designed for the many-core processor. Chapter 5 illustrates the design and implementation of "tile percolation", an OpenMP based parallel programming model for many-core processors with software-managed memory hierarchy system. Chapter 6 makes the conclusion of the thesis and outline the design of the future works.

Keywords/Search Tags:

Many-core, On-chip Network, OpenMP programming API

PDF Full Text Request

Related items

1	Design Of Parallel Huffman Compress Algorithm Program Based On OpenMP And Multi-Core Architecture
2	Research On Compilation And Optimization For OpenMP Programs
3	Study And Application Of OpenMP Parallel Programming Model And Optimization Method Of Performance
4	Parallel Programming And Optimization Based On Multi-core Processors
5	Research And Application Of MPI And OpenMP Hybrid Programming Model Based On SMP Clusters
6	Programming model and execution model for OpenMP on the Cyclops-64 manycore processor
7	Application Of Multi-core Parallel Programming Technology To Accelerate Digital Image Processing
8	Research On Multi-core Architecture And Prototype Implementation Technology Of Multi-Processor System-on-Chip And Network-on-chip
9	Research On High Performance Of GRAPES Tangent/Adjoint Model With The MPI/OpenMP
10	Efficient Network On Chip Architecture: Router Inside The Core