Compiling For The Speculative Multithreading Architecture

Posted on:2002-05-12

Degree:Doctor

Type:Dissertation

Country:China

Candidate:K Deng

Full Text:PDF

GTID:1118360065461567

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

High-performance,general-purpose microprocessors serve as compute engines for computers ranging from personal computers to supercomputers. Sequential programs constitute a major portion of real-world software that run on the computers. State-of-the-art microprocessors exploit instruction level parallelism (ILP) to achieve high performance on applications by searching for independent instructions in a dynamic window of instructions and executing them on a wide-issue pipeline. Increasing the window size and the issue width to extract more ILP may hinder from achieving high clock speed,limiting over-all performance,especially for the forthcoming billion-transistor per-chip era.The Speculative Multithreading Architecture (SMA) employs a de-centralized organization to construct multiple small windows and many narrow-issue execution units to exploit massive ILP. Sequential programs are partitioned into code fragments called threads,which are speculatively executed in parallel. Previous research showed the SMA architecture could achieve substantial performance boost and efficient resource utilization.Compiler optimization holds a very important position in SMA research. There are three key factors baffle the SMA processor:context load imbalance,inter-thread control dependence and inter-thread data dependence. To maintain performance boost,the SMA compiler must eliminate those factors thoroughly.The work of this paper include:1 The paper thoroughly investigates execution behavior of various applications on SMAarchitecture. Key performance factors are also presented..2 A set of heuristic rules is presented to accelerate speculative execution of SMA threads. Rules include optimized thread partition strategy,.contexts load balance strategy and DEE-like thread mapping strategy.3 Thoroughly reviewed memory bandwidth requirement of SMA processor and difference of various instruction fetch policies. To improve cache performance under SMA model,the paper introduces hardware software co-operative optimization. On the software side,compiler inserts prefetch instructions explicitly;on the hardware side,an SMA cache filter is added to cut down unnecessary prefetch.4 Guided by feedback-based optimization strategy,the paper presents a dynamic profile based continuous optimization framework - SMARCOF. Based on the DLX simulator,SMARCOF is modified with SMA specific extension and heuristic optimizing rules. Simulation of SPEC code shows that above rules could exploit hybrid parallelism effectively with rather low overhead.Conclusively,the SMA architecture is a promising way to implement high performance processor;the continuous optimization framework SMARCOF can utilize dynamic execution profiles and heuristic rules to eliminate SMA performance hindrance effectively. Preliminary work discussed in this thesis showed encouraging performance boost potential and application compatibility of SMARCOF. Future improvement could be expected.

Keywords/Search Tags:

SMA, Compiler Optimization, Prefetch, Dynamic Execution Profile, Feedback-based Optimization

PDF Full Text Request

Related items

1	Resarch On Cross-Plateform Compiler Analysis And Optimization Technique Based On Peak Architecture
2	Research On Compiler Optimization Technologies For THUMP
3	Research On Optimization Technology Of Compiler And Memory Access For Domestic Sunway Platform
4	Compiler Optimization Recommendation For Symbolic Execution To Improve MC/DC Coverage
5	The Research And Implementation Of Predicated Execution
6	A Quick And Generic Approach Of Selecting Compiler Optimization Options
7	Open-source Compiler Orc System Optimization Techniques
8	Low-Power Techniques For Architecture And Compiler Optimization
9	Design And Implementation Of Indirect Prefetch Algorithm Based On Shenwei GCC Compiler
10	Profile-Guided Optimization Based On Network Processors