Research On The Design Techniques Of Cluster-on-Chip Architecture

Posted on:2011-12-12

Degree:Doctor

Type:Dissertation

Country:China

Candidate:L B Huang

Full Text:PDF

GTID:1118330332487007

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Multicore architecture has become the neccessary approach for improving processor performance in accordance with Moore's Law. However, as core number increases and the chip heterogeneousity expands, the rapid performance improvement would not impose the flexibility of multicore resources management and the simplicity of application programming. On the contrary, the complexity of multicore structure makes it difficult to utilize a large number of chip resources efficiently. To resolve the contradiction between growing computing power and relatively backward multicore computing model and management, this paper introduces HPC cluster technology into the multicore architecture design, resulting in cluster on-chip (CoC) computing paradigm. It can provide well support for efficient multicore organization and utilization. To carry research on CoC, We start from the underlying hardware aspect and work on different implementation levels, including CoC hardware structures, parallel programming model. This work will explore key technologies to make the performance of multicore maintain Moore's Law. The main contributions of this paper are as follows:1. We put forward a low-cost high-performance floating-point SIMD accelerator architecture for the CoC computing node. The shared low-cost design method for fixed-point SIMD accelerator is mature, but it is not the case for the floating-point SIMD accelerator, which remains the simple replication design method. We present the first floating-point SIMD accelerator architecture with the hardware shared design. The original double-precision datapath can be segmented to support two single-precision parallel operations. In addition, to address the increasing demand for 128-bit floating-point computation, this paper also proposes the 128-bit SIMD low-cost hardware design. The experimental results show that the proposed SIMD accelerator structure can effectively reduce hardware cost and power consumption.2. We propose an efficient data-parallel accelerator architecture for overcoming its performance bottleneck. There are many obstacles existing in data parallel architecture such as non-aligned access, data permutation, and control flow, causing it unable to reach the theoretical performance as expected. In this paper, after quantifying the characteristics of data permutation operations in the vectorized applications, we propose the IDP mechanism along with its hardware structure and compiler strategy. It can significantly reduce the explicit data permutation operations and effectively overcome the permutation bottleneck of data-parallel accelerator. In addition, the vectorized loop buffering mechanism, which eliminates the vector loop control and address calculation overhead is also proposed to effectively perform the loops in data-parallel accelerator. Based on above techniques, an efficient parallel high-performance multimedia accelerator, called MCP, is introduced. 3. We present an efficient on-chip network architecture for CoC. The classic network on chip design only optimized for long unicast communication and usually has power and latency defects. This is ackward for some important applications such as cache coherence protocols, SIMD computations and so on, requiring extensive multicast or broadcast communication. This paper presents hierachical virtual bus interconnection structure. Based on the existing datapath of network link, we reconstruct the virtual bus dynamically upon request. It can provide low latency unicast and multicast / broadcast communication services. We also propose the hardware scheme for supporting hybrid shared memory/message passing programming model and design the appropriate memory hierarchy and coherence protocol, which is seamlessly compatible with existing MPI and OpenMP programs.4. We design an efficient hybrid parallel programming model for CoC. It exploits various levels of parallelism. In addition, based on the consideration that SIMD accelerator can achive higher performance at lower hardware cost and power consumption than multicore architecture, we introduce the loop-based streamization programming model (LSM) for the data-parallel accelerator existing in CoC computing node. Similar to incremental OpenMP programming, it can reduce the complexity of tranditional stream programming and reduce the programming burden on the programmer greatly. At the same time, the hardware extension of GPP processor is also proposed. The experiments show that it can efficiently utilize data parallelism accelerator and gain great performance improvement.CoC architecture design is a new topic. Current research works only remain at its concept and do not touch at the concreat desgin. In this paper, we carried on the research from three aspects for CoC design: data-parallel core design for CoC node, CoC chip network architecture and CoC parallel programming model. The realization, verification and evaluation results of this paper show that these techniques are effective, and can be used in future multi-core microprocessor design and implementation.

Keywords/Search Tags:

Cluster-on-Chip, Multicore, SIMD, Implicit data permutation, Virtual bus on chip network, Hybrid programming model

PDF Full Text Request

Related items

1	Hierarchical Structure Of The Cluster Virtual Bus On-chip Interconnect Network Design And Research
2	Research On Construction Of Network Topology And On-Chip Router For Network On Chip (NOC)
3	Research Of Multi-level Parallelism Programming Pattern For Hybrid Parallel Computing Environment
4	Research On Data-driven Crosspoint-Queued On Chip Router
5	Study On Hybrid Parallel Molecular Dynamics Computing On Multicore Cluster
6	Automatic Generation And Optimization Of Data Permutation Instructions For Simd Devices
7	Automatic Generation And Optimization Of Data Permutation Instructions For SIMD Devices
8	Design And Exploration Of A Double-layer Optical Networks-on-chip Based On Virtual-cluster
9	Research On The Key Technology Of Multi-core Chip Design Based On High Density Computing
10	Design And Study Of On Chip Optical Interconnection For Multicore Processor