Font Size: a A A

Research And Implementation On The Key Techniques Of High Efficiency Computational DRAM Architecture

Posted on:2013-02-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:D WuFull Text:PDF
GTID:1118330371480836Subject:Microelectronics and Solid State Electronics
Abstract/Summary:PDF Full Text Request
High performance computing (HPC) plays a very important role in the application of information industry, especially for scientific research and national economic activities. The hybrid computing architecture, organized as host processor plus co-processor(s), has already becoming the mainstream in the development of high performance computer systems. However, the sustained performance of the coprocessors used in today's high performance hybrid computing computer systems is limited by its off-chip memory bandwidth and on-chip communication bandwidth, as well as low efficiency (performance-per-Watt). Furthermore, the design of the coprocessor still faces several challenges, such as "memory wall", "power consumption wall", parallel programming, and on-chip interconnection communication.Starting from the system balance design, this thesis focuses on the balance of memory bandwidth and communication bandwidth and introduces the computational DRAM (CDRAM) architecture to approach those problems and challenges. The goal of CDRAM architecture is to provide high efficient computing capability based on the embedded DRAM (eDRAM) technology, network-on-chip (NoC) technology and multi-cores technology. Multiple processing elements are integrated in it according to the heterogeneous SIMD structure. The CDRAM processor could be attached to the host processor via the commodity memory interface, and acts as high parallel computing coprocessor with high memory capacity to accelerate computing-intensive applications. It can co-operate and share memory with Intel, AMD processors to implement a more efficient hybrid computing computer system than conventional schemes.This thesis investigates some key technologies of the CDRAM architecture.To satisfy the memory bandwidth requirements, the impact of memory system on processor's sustained performance is investigated at first. Based on the memory-processor integration by eDRAM technology, a hierarchical software/hardware co-management memory system structure and a novel eDRAM structure are proposed. The proposed eDRAM structure is well designed for high capacity, wide memory bus width and low access latency, which can effectively increase the coprocessor memory bandwidth.A high performance processing element (PE) architecture is proposed according to the characteristics of the HPC applications implemented on CDRAM. To support the data-level parallelism of the applications effectively, pipeline with vertical vector processing and sub-word parallelism technology of function units are used. To address the control parallelism problem in SIMD architectures, independent control is implemented in the PE to support the nested if-then-else constructs. Concurrent execution of different data streams on different subsets of the PEs array can dramatically improve the CDRAM's sustained performance in searching applications.Traditional Mesh NoC usually has long communication latency and too many communication hops. To improve the performance of NoC and meet the communication requirements of HPC applications, a broadcast and permutation network (BPN) is attached to the traditional Mesh NoC and a hierarchical BP-Mesh NoC architecture is proposed. BP-Mesh has the advantages of high bandwidth, low latency, highly scalability and flexibility. It is implemented in circuit switching technique. The cost model, performance model and power model of the BP-Mesh NoC are discussed in this thesis. The experimental results illustrate that the proposed BP-Mesh NoC not only has low design complexity, but also increases communication bandwidth and significantly decreases the communication latency and communication power.In addition, an explicitly computing and memory access parallelism (ECMP) programming model is proposed in this thesis to the benefits of parallel program development and increasing the utilization of transistor and memory bandwidth.Finally, based on the UMC 0.18μm standard CMOS process, a four-core CDRAM prototype chip named ESCA is fabricated. Comprehensive performance and efficiency analysis are done under the CDRAM software development environment. Comparing with other coprocessors, the experimental results show that the proposed CDRAM architecture could provide high performance computing capability and power efficiency with low hardware cost.
Keywords/Search Tags:high performance computing (HPC), hybrid computing, coprocessor, highefficiency, computational DRAM (CDRAM), embedded DRAM (eDRAM), network-on-chip (NoC), multi-core
PDF Full Text Request
Related items