Highly efficient multithreaded architecture

Posted on:2006-11-20

Degree:Ph.D

Type:Thesis

University:University of Rochester

Candidate:El-Moursy, Ali A

Full Text:PDF

GTID:2458390008953426

Subject:Engineering

Abstract/Summary:

The performance and power optimization of dynamic superscalar micro-processors requires striking a careful balance between exploiting parallelism and hardware simplification. Hardware structures which are needlessly complex may exacerbate critical timing paths and dissipate extra power. In a Simultaneous Multi-Threaded (SMT) processor, it is particularly challenging to achieve hardware structure simplification due to the increased utilization of the structures afforded by multi-threading.; In this thesis, we take three approaches to increase the efficiency of SMT processors. First, we propose new front-end policies that reduce the required integer and floating point issue queue sizes in a monolithic SMT processor. We explore both general policies as well as those directed towards alleviating a particular cause of issue queue inefficiency. For the same level of performance, the most effective policies reduce the issue queue occupancy by 33% for an SMT processor with appropriately-sized issue queue resources. Second, we examine processor partitioning options for a large numbers of on-chip threads. While growing transistor budgets permit four and eight-thread processors to be designed, design complexity, power dissipation, and wire scaling limitations create significant barriers to their actual realization. We explore the design choices of sharing, or of partitioning and distributing, the front end (instruction cache, instruction fetch, and dispatch), the execution units and associated state, as well as the L1 Dcache banks, in a Clustered Multi-Threaded (CMT) processor. We show that the best performance is obtained by restricting the sharing of the L1 Dcache banks and the execution engines among threads. On the other hand, significant sharing of the front-end resources is the best approach. Finally, we devise thread co-scheduling policies for a Chip Multi-Processor (CMP) of dual-threaded processors. Our techniques, which operate at time-slice granularity, use ready and in-flight instructions metrics to co-schedule compatible threads. We achieve a 20% performance improvement over the average of the different possible arbitrary thread groupings, and about 30% better performance than the worst arbitrary grouping of threads.

Keywords/Search Tags:

Performance, Processor, Issue queue, SMT, Threads

Related items

1	Design And Optimization Of High-performance Issue Queue
2	Resource management techniques for performance and energy efficiency in multithreaded processors
3	Power-efficient issue queue design
4	Instruction Issue Policy Of Embedded Processor Based On Resource Reuse
5	Low-power Research On High Performance General-purpose Processor Core Design
6	Implementation And Optimization Of High-Performance Floating-Point Unit In X Processor
7	Performance Issue Identification And Diagnosis Method Of SaaS Software Runtime Based On Log
8	Research On High Performance Embedded Risc-Based Processor
9	The Study Of The Architectures And Implementation Technologies Of Multi-PEs Network Processors
10	Design And Implementation Of High-performance H.264SVC Video Server Systems Based On Tilera Gx36Multi-core Processor