Research On Instruction Dynamic Mapping Algorithm Of EDGE Architecture

Posted on:2013-07-21

Degree:Master

Type:Thesis

Country:China

Candidate:J Gao

Full Text:PDF

GTID:2268330392968735

Subject:Microelectronics and Solid State Electronics

Abstract/Summary:

PDF Full Text Request

Monolithic structure commonly used in out-of-order superscalar processorshas severly limited performance improvement of microprocessor. EDGE, as one ofthe models used to cope with the bottleneck in the performance improvement ofmicroprocessor, aborts the monolithic power hungry and unscalable structure in itsarchitecture model. In distributed EDGE architecture, instructions are mapped toexecute in several tiles at the same time. The operand communication among tilesneeds delay and leads performance degradation. The instruction mapping algorithmtries to mitigate the performance loss due to operands communication delay bycarefully balancing the communication delay among tiles and degree of parallelism.In TRIPS microprocessor, critical resources scatter asymmetric in topologyand static instruction mapping algorithm is used. This will lead unbalance in load atETs and hot spot in operand communication network, which will result inperformance degradation.In this paper, an EDGE architecture like TRIPS is implemented to studyinstruction dynamic Deep mapping algorithm in the M5-EDGE simulator. Resultshow that Deep mapping with round-robin fashion at choosing ET with issue width1and2, the performance is85%and98.3%compared with SPDI, without compiler’sschedule and optmization. When take the RT/DTs’ topology location intoconsideration, choosing ET using numbering sequence, zigzag sequence andcalculating global communication hops of a hyperblock to choose a tile asoptimizations. Average hops are decreased by2.63%、2.18%and4.70%respectively, and IPC are improved by1.07%、1.21%and2.11%respectively，compared with the base Deep mapping algorithm at the issue width1.Optimizations which decrease communication hops of instructions can improve IPCnotably.Over90%operands are delivered by local bypass path in Deep mapping,which largely alleviate load of OPN. Simulation shows that when the bypass widthis2fold of the issue width, the delay of local operands bypass is nearly0. Byincreasing width of local bypass path, latency of operand delivery can be decreasedeffectively. When put RTs into ETs according to register number, IPC gained in base Deepmapping algorithm is improved by1.77%. Taking DTs’ location into considerationas optimizations, ETs close to DTs are preferentially selected and calculatehyperblock’s communication hops to select a proper ET. These optimizationsimprove IPC by1.17%and1.89%compared to base Deep algorithm. When RTsand DTs are distributed into ETs, a4x4grid topology is obtained. IPC gained byDeep mapping algorithm is97.18%and113.42%with issue width1and2,compared SPDI. A simple optimization above Deep, these comparisons is97.32%and114.06%. IPC will be improved notably when topology hops decrease becauseof micro-architecture changes or optimizations over Deep mapping algorithm.

Keywords/Search Tags:

EDGE architecture, dynamic instruction mapping, performance analysis

PDF Full Text Request

Related items

1	Research On Static Instruction Mapping Algorithm Of Edge Architecture
2	Research On Real-time Simulation Of RISC-V Instruction System Based On Shenwei CPU
3	Research On The Custom Instruction Mapping Of Application Specific Instruction Set Processors
4	Quantitative Analysis On The Impact Of Memory Access Behavior By Instruction Dynamic Scheduling
5	Research On Instruction Set Simulation Echnology For Binary Analysis
6	Block-aware instruction set architecture
7	A Modular and Extensible Architecture Integrating Sensors, Dynamic Displays of Anatomy and Physiology, and Automated Instruction for Innovations in Clinical Educatio
8	Research And Design Of High Performance Digital Signal Processor
9	Software Performance And Status Analysis Using HPM
10	Research On The Design And Implementation Techniques Of Customizing Application Specific Instruction Set Processors