Font Size: a A A

Multi-core Processors Hierarchical Storage System Research

Posted on:2013-04-27Degree:MasterType:Thesis
Country:ChinaCandidate:R J XiaoFull Text:PDF
GTID:2248330395451256Subject:Microelectronics and Solid State Electronics
Abstract/Summary:PDF Full Text Request
Recently, heading by pad computers and smart-phones, various consumer electronics products witness their unprecedented rapid market growth, serving as the tremendous impetus for upriver integrated circuits industry. As the core unit in electronics device, the technology feature of processor is shifting from high-performance to both high-performance and low-power. However, due to the slowdown pace of process technology innovation, the raw performance can’t be improved consistently by using the traditional way of increasing clock frequency. Meanwhile, multicore system with inherent parallelism and flexibility has become the mainstream architecture of current processors. For various energy-sensitive embedded applications, multicore processors, with the features of inner parallel computing capability, scalability and low-power potentials, are proved to be efficient.The research object of this thesis is the hierarchical memory system for multicore processors. Based on the memory systems of currently existed multicore processor solutions, this thesis proposed a memory hierarchy which is more suitable for embedded applications. The target of this research is realize both high-performance and low-power design target by innovative design of memory hierarchy, in order to meet the needs of embedded applications.The innovative work of the thesis could be summarized as flow:(1) Cluster-based memory hierarchyIn this thesis, we propose a kind of cluster-based memory hierarchy. The weights of different layers in the hierarchy are optimized by considering the requirements of embedded applications. Data locality is improved by extended register file design, and hardware overhead is reduced by cache-free design. Both data locality and memory stratum are improved by partition the private and shared data memory.(2) Extended register file designWe proposed an extended register file design scheme compatible with32-bit instruction width in the cluster-based memory hierarchy. The number of register file is doubled from32to64, with both data locality and overall performance been improved. Besides, we also proposed a modified message-passing inter-core communication mechanism by utilizing the address space of extended register file. Test result shows the number of instructions related to inter-core communication could be reduced by around50%when adopting the proposed method, and the inter-core communication efficiency is improved. (3) Cache-free designWe adopt a cache-free design scheme in the cluster-based memory hierarchy, and take the private data memory as the substitute. So the chip area and power budget are reduced significantly. In the thesis, a new direct-memory-access communication method is proposed based on the private data memory. By designating in the packet header, message-passing communication could be accomplished without participation of processor cores. Thus both communication and computation efficiency are improved.(4) Shared memory units within the clusterIn the cluster-based memory hierarchy, we implement shared memory that could be accessed by all processors in the same cluster. And we also proposed a novel shared-memory communication mechanism and a simple hardware-aided mailbox synchronization method. By dividing the memory into private and shared parts, data locality is improved, and the issue of memory access latency is alleviated.(5) Chip implementation and applicationsA16-core processor with proposed cluster-based memory hierarchy is implemented in TSMC65nm LP CMOS process flow. Two clusters are implemented in the chip, and each cluster includes8processor cores and1memory core. The chip occupies9.1mm2, and each core occupies0.43mm. The max operating clock frequency under1.2V is750MHz. We implemented a3780-point FFT module on the chip to evaluate the improvement of both performance and energy efficiency due to the cluster-based memory hierarchy. Test result shows the typical power consumption of each processor core is34mW, lower than most same-kind precedential works.
Keywords/Search Tags:Multicore Process, Embedded Application, Memory Hierarchy, Low-power Design, Inter-core Communication
PDF Full Text Request
Related items