Font Size: a A A

Implementation and performance evaluation of scheduled dataflow (SDF) architecture

Posted on:2002-06-04Degree:Ph.DType:Dissertation
University:The University of Alabama in HuntsvilleCandidate:Arul, Joseph MariaFull Text:PDF
GTID:1468390011997875Subject:Computer Science
Abstract/Summary:
This dissertation presents the implementation (simulated) and evaluation of a nonblocking, decoupled memory/execution, multithreaded architecture known as the Scheduled Dataflow (SDF) architecture. Recent focus in the field of new processor architecture is mainly on Very Long Instruction Word (VLIW) (e.g., Itanium), superscalar and superspeculative designs. This trend allows for better performance at the expense of increased hardware complexity, and possibly higher power expenditures resulting from dynamic instruction scheduling. The SDF system deviates from this trend by exploring a simpler, yet powerful execution paradigm that is based on dataflow, multithreading and decoupling of memory accesses from execution. A program is partitioned into non-blocking execution threads. In addition, all memory accesses are decoupled from the thread's execution. Data is pre-loaded into the thread's context (registers), and all results are post-stored after the completion of the thread's execution. The decoupling of memory accesses from thread execution requires a separate unit to perform the necessary pre-loads and post-stores and to control the allocation of hardware thread contexts to enabled threads. Thus, SDF contains two units called Synchronization Processor (SP) and Execution Processor (EP).; Even though multithreading and decoupling are possible with control-flow architecture, the non-blocking and functional nature of the SDF system make it easier to coordinate the memory accesses and execution of a thread, as well as eliminate unnecessary dependencies among instructions. Evaluation is done based on comparing the execution cycles of SDF with the execution cycles of MIPS (DLX simulator) architecture. The SDF simulator can also be easily modified to contain more than a single SP and a single EP. The execution cycles on the SimpleScalar (a superscalar simulator) and VLIW (as facilitated by Trimaran simulator and TMSC6000) architectures are compared with SDF system consisting of multiple SPs and EPs.; Our performance comparisons show that the SDF system consistently outperforms MIPS like system. The SDF system also outperforms superscalar and VLIW when the number of functional units (viz., integer and floating point units, or EPs and SPs) exceeds a certain number. The SDF system performance improvements result from multithreading and decoupling.; This dissertation relies on an instruction set simulator for the SDF system and hand-coded benchmarks.
Keywords/Search Tags:SDF, Architecture, Execution, Evaluation, Dataflow, Thread, Performance, Simulator
Related items