Font Size: a A A

A scalable instruction queue design for exploiting parallelism

Posted on:2005-05-29Degree:Ph.DType:Dissertation
University:University of MichiganCandidate:Raasch, Steven EarlFull Text:PDF
GTID:1458390008491942Subject:Computer Science
Abstract/Summary:
To maximize the performance of wide-issue superscalar out-of-order microprocessors, the issue stage must be able to extract as much instruction-level parallelism (ILP) as possible from the dynamic instruction stream. This dissertation examines several approaches to increasing available ILP while minimizing the impact on cycle time.; First, I describe and evaluate a novel instruction queue design (the Segmented Instruction Queue) that eliminates the correspondence between IQ size and cycle time. The 512-entry Segmented IQ achieves between 58% and 98% of the performance of similarly-sized idealized instruction queue of conventional design though the latency of the latter is approximately 256 times larger. The Segmented IQ can be used as a component of a clustered architecture, another approach to reducing cycle-time penalties in wide-issue machines. The dependence tracking mechanism used by the Segmented IQ can be applied to the problem of instruction placement in clustered architectures.; By changing the mix of instructions present in the IQ, simultaneous multithreading (SMT) can also be used to increase the amount of available ILP. Under SMT, partitioning schemes are needed to distribute resource among threads; however, some of these schemes, clustered architectures in particular, can significantly reduce SMT workload performance. If an SMT machine is to use a clustered microarchitecture, the choice of instruction placement policy must be carefully evaluated to avoid performance degradation. Experiments show that naively allocating clusters to individual threads, eliminating the dynamic sharing that is the core of SMT, can reduce workload performance on a four-cluster architecture by as much as 26% versus a simple load-balancing scheme. This dissertation presents data that characterizes the performance of SMT workloads in clustered architectures using both conventional instruction queues and segmented instruction queues.; Individually, these mechanisms represent viable approaches to increasing available ILP. When the Segmented IQ is used in an SMT processor design, workload performance achieves an average of 80% and 86% of the idealized performance for two- and four-thread workloads, respectively, indicating that these approaches can be combined to form an effective approach to increasing processor utilization and performance.
Keywords/Search Tags:Instruction, Performance, Segmented IQ, Available ILP, SMT
Related items