High performance and energy efficient multi-core systems for DSP applications

Posted on:2008-09-20

Degree:Ph.D

Type:Dissertation

University:University of California, Davis

Candidate:Yu, Zhiyi

Full Text:PDF

GTID:1448390005969110

Subject:Engineering

Abstract/Summary:

This dissertation investigates the architecture design, physical implementation, result evaluation, and feature analysis of a multi-core processor for DSP applications. The system is composed of a 2-D array of simple single-issue programmable processors interconnected by a reconfigurable mesh network, and processors operate completely asynchronously with respect to each other in a Globally Asynchronous Locally Synchronous fashion. The processor is called Asynchronous Array of simple Processors (AsAP). A 6 x 6 array has been fabricated in a 0.18 mum CMOS technology. The physical design concerns timing issues for robust implementations, and takes full advantages of their potential scalability. Each processor occupies 0.66 mm2, is fully functional at a clock rate of 520--540 MHz under 1.8 V, and dissipates 94 mW while the clock is 100% active. Compared to the high performance TI C62x DSP processor, AsAP achieves performance 0.8--9.6 times greater, energy efficiency 10--75 times greater, with an area 7--19 times smaller. The system is also easily scalable, and is well-suited to future fabrication technologies.; An asymmetric inter-processor communication architecture is proposed. It assigns different buffer resources to the nearest neighbor interconnect and the long distance interconnect, can reduce the communication circuitry area by approximately 2 to 4 times compared to the traditional Network on Chip (NoC), with similar routing capability. A wide design exploration space is investigated, including supporting long distance communication in GALS systems, static/dynamic routing, varying numbers of ports (buffers) for the processing core, and varying numbers of links at each edge. The use of GALS style typically introduces performance penalties due to additional communication latency between clock domains. GALS chip multiprocessors with large inter-processor FIFOs as AsAP can inherently hide much of the GALS performance penalty, and the penalty can even be driven to zero. Furthermore, adaptive clock and voltage scaling for each processor provides an approximately 40% power savings without any performance reduction.

Keywords/Search Tags:

Performance, DSP, Processor, Clock, GALS

Related items

1	A multiple-clock-domain bus architecture using asynchronous FIFOs as elastic elements
2	Study On GALS Interconnection And System Design
3	Research And Design Of SDIO Host Controller Based On GALS
4	Noc Clock Network And Related Research
5	Multiple clock domain microarchitecture design and analysis
6	Design of Energy-Efficient Many-Core MIMD GALS Processor Arrays in the 1000-Processor Era
7	Research On SoC And Design Of Interface Circuit Based On GALS Technology
8	Research On GALS-Based NoC Router
9	Hardware and software optimizations for multiple clock domain microprocessors
10	GAPLA: A globally asynchronous locally synchronous FPGA architecture