Hardware optimizations enabled by a decoupled fetch architecture

Posted on:2002-10-28

Degree:Ph.D

Type:Thesis

University:University of California, San Diego

Candidate:Reinman, Glenn D

Full Text:PDF

GTID:2468390011992399

Subject:Computer Science

Abstract/Summary:

In the pursuit of instruction-level parallelism, significant demands are placed on a processor's instruction delivery mechanism. In order to provide the performance necessary to meet future processor execution targets, the instruction delivery mechanism must scale with the execution core. Attaining these targets is a challenging task due to I-cache misses, branch mispredictions, and taken branches in the instruction stream. Moreover, there are a number of hardware scaling issues such as wire latency, clock scaling, and energy dissipation that can impact processor design.; To address these issues, this thesis presents a fetch architecture that decouples the branch predictor from the instruction fetch unit. A Fetch Target Queue (FTQ) is inserted between the branch predictor and instruction cache. This allows the branch predictor to run far in advance of the address currently being fetched by the instruction cache. The decoupling enables a number of architectural optimizations including multi-level branch predictor design and fetch directed instruction prefetching.; A multi-level branch predictor design consists of a small first level predictor that can scale well to future technology sizes and larger higher level predictors that can provide capacity for accurate branch prediction.; Fetch directed instruction cache prefetching uses the stream of fetch addresses contained in the FTQ to guide instruction cache prefetching. By following the predicted fetch path, this technique provides more accurate prefetching than simply following a sequential fetch path.; Fetch directed prefetching using a contemporary set-associative instruction cache has some complexity and energy dissipation concerns. Set-associative caches provide a great deal of performance benefit, but dissipate a large amount of energy by blindly driving a number of associative ways. By decoupling the tag and data components of the instruction cache, a complexity effective and energy efficient scheme for fetch directed instruction cache prefetching can be enabled.; This thesis explores the decoupled front-end design and these related optimizations, and suggests future research directions.

Keywords/Search Tags:

Fetch, Instruction, Optimizations, Branch predictor

Related items

1	Research On Fetch Control Mechanism Based On SMT Processors
2	Research On Simultaneous Multithreaded Processor Front-end
3	The Research Of Key Technique On Instruction Control Unit In The X-Microprocessor
4	Design And Verification Of Instruction Fetch And Dispatch Unit In Hign Performance DSP
5	Branch optimizations and instruction-level parallelism exploitation for dynamic superscalar and VLIW processors
6	Design And Implementation Of The Instruction Fetch Unit And Multiple Instruction Flows Extension In The YHFT-Matrix DSP
7	Multi-threaded Processor Storage Structure Study
8	Records Branch Predictor
9	TG-share Branch Predictor Designed For Frequent Context Switching
10	Performance Modeling Of Branch Predictors In Out-of-order Processors