Font Size: a A A

Compilation Techniques And Compiler Optimizations For Dataflow-Like Driven Tiled Processor Architecture

Posted on:2010-05-09Degree:DoctorType:Dissertation
Country:ChinaCandidate:L WangFull Text:PDF
GTID:1118360302471490Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Tiled processor architecture is an effective respondence to the wire delay, power consumption, and design and implementation complexity challenges. It is an efficient chip-multi-processor scheme with good performance potential. The key point in current researches of tiled processor architecture is how to accelerate general purpose applications. One can solve this problem only by hardware-software co-design. The primary challenge in hardware-software co-design is to design an instruction set system with high parallelism exposed and low energy consumption, and to design and implement a high performance parallelism compiler targeted it. This dissertation focuses on designing a dataflow-like instruction set architecture and the compilation and optimization techniques targeting it for tiled processor TPA-PI, including how to use software's ability to manage the computation and storage resource on a chip, how to exploit and explore parallelism in applications, how to develop new compilation techniques that suitable for tiled processor. The major contributions of this work are as follows:(1) Designed an instruction set DISC-I for TPA-PI processor, which is a dataflow-like instruction set. DISC-I instruction set is characterized as instruction block atomic execution and direct communication among instructions. It can efficiently support instruction level parallelism, and reduce the cost of each single instruction through block atomic execution, explore multi-level and multi-granularity parallelism in applications, and can further simplify the complexity of hardware design.(2) Designed and implemented a parallel compiler to automatically parallelize serial applications based on LLVM framework, which can break an application into a serial of hyperblocks, assign hardware resource for each hyperblock, and map each instruction into hardware. The features of TPA-PI compiler are its intermediate representation TPA-C, which represents a program in a two-level structure: the control-flow graph in unit of hyperblock and the local data-flow graph in unit of instruction within each of the hyperblock; and the hardware resource and application-oriented instruction mapping method.(3) Proposed a choosing discipline of hyperblock construction algorithm for higher control-flow predictability. We designed 5 different algorithms to construct hyperblock, and analyze their influence to the control predictability of constructed hyperblocks. Experiments point out that using path execution frequency or path width as heuristic algorithm can improve branch prediction accuracy, especially in deep prediction. But their strength is affected by the control behavior of applications. We find the rule to choose heuristic algorithm for different kinds of applications: when there are a few hot paths in an application, choosing path width-oriented algorithm, and when there are several hot paths in an application, choosing path execution frequency-oriented algorithm.(4) Designed a software/hardware co-operate hyperblock-level branch predictor, which can take the advantage of software's low overhead, and hardware's flexibility and efficiency. This predictor needs only 1/2 of storage resource, and reduces the amount of hardware predictor invoking by 0.1% to 15%, to reduce both implementation cost in hardware and power consumption, while keeping prediction accuracy.(5) Proposed an implementation method of predicated execution technique for TPA-PI architecture. It has many advantages, such as low overhead and full-support of each instruction in TPA-PI ISA. But it also induces the cost of software fanout tree. To remove these software fanout trees, we propose a profiling-guided optimization, which can keep high parallelism, while removing useless execution.All the works in this dissertation can be used to guide the designing of parallel programming model and compiler on tiled processor architecture, to be helpful for the designing of high-performance on-chip multi-processor architecture, and to expose more parallelism from application with less hardware, software complexity and less hardness in parallel programming.
Keywords/Search Tags:Tiled Processor Architecture, Dataflow-like Computing Model, Compiler, Compiler Optimizations, Hyperblock-Level Branch Prediction, Predicated Execution
PDF Full Text Request
Related items