Compiler Design For VLIW DSP With Performance And Power Consumption Optimizations

Posted on:2007-10-11

Degree:Doctor

Type:Dissertation

Country:China

Candidate:D L Hu

Full Text:PDF

GTID:1118360215970494

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

At present, Very Long Instruction Word (VLIW) architecture has been adopted popularly by most high-end Digital Signal Processors (DSPs). VLIW machines tend to use static scheduling which allowing the compiler to directly schedule machine resource usage, so the hardware's behavior is heavily dependent on the compiler arrangements to code. VLIW architecture challenges the compiler in the following two aspects: i) Without sophisticated compiling techniques, most VLIW DSPs fall far short of their performance goals. ii) Since compiler can control hardware's behavior effectively, besides exploiting Instruction Level Parallelism (ILP), the compiler can change the behavior of software running on the hardware by optimized scheduling such that the power/energy consumption of certain program decreases. From here we see that it is very valuable in theory and practice to research compiling techniques for high performance & low power/energy consumption.Based on national 863 project "32-Bit High Performance Embedded Digital Processing Processor (DSP) Chip Technology Research", this dissertation is involved with development of VLIW DSP compiler as well as compiling techniques for high performance & low power/energy consumption. Our works mainly consist of three parts: i) Design and implementation of VLIW DSP compiler Based on retargetable compiler infrastructure IMPACT. ii) The study on compiling techniques for high performance, i.e. complementary predication and software pipelining, by virtue of VLIW DSP architecture features. iii) The study on compiling techniques for low power/energy, i.e. SIMD (Single Instruction Multiple Data) instructions automatic vectorization and loop buffering. The main contributions are as follows:(1) An optimization method for performance based on complementary predication is proposed. The method optimize performance from three perspectives: i) Based on binary decision diagram (BDD) predicate analysis system, an algorithm to optimize control structures in programs utilizing complementary predicate is presented. ii) Based on traditional graph coloring register allocation, a new algorithm to construct unified and simplified interference graph utilizing complementary predicate is presented, which can reduce spill code. iii) Complementary predicate-aware scheduling is presented to reduce the superfluous commitment of resources to operations whose predicates evaluate to false at run-time, and the architecture is modified to support such scheduling. By this way, recourses can be more efficiently used, higher ILP can be achieved.(2) Hyperblock-based unified cluster assignment and modulo scheduling is proposed. Comparing to basic block, Hyperblock can provide larger schedule region for exploiting ILP and enable modulo scheduling to deal with loops with control flow. The clustered structure in VLIW DSP needs compiler to assign every operation and operand to a specific cluster and coordinate data movement between clusters to achieve fine ILP. The proposal method, first forms hyperblocks from immediate code utilizing complementary predication, then performs modulo scheduling; meanwhile cluster assignment for operations and operands is done. Experiments show that the proposal method is effective in improving performance.(3) After a comprehensive survey on low power/energy compilation technology, especially on instruction-level and function-level power model, is presented, a low energy compilation method based on SIMD automatic vectorization is proposed. Comparing general instructions, SIMD instructions are energy-efficient. Previous methods are not general for they either only generate common SIMD instructions or be domain-specific. In order to generate more SIMD instructions for energy reduction, we present a low energy compilation method based on SIMD automatic vectorization. The proposal method decompose the task of automatic vectorization into two phases: first, complex SIMD alternatives are recognized from high level immediate code; then after loop unrolling low level immediate code, real SIMD instructions are generated by means of energy-cost based tree-pattern matching. The method is more intuitive than those proposed previously. Experiments show that the proposal method is effective in optimizing performance and energy.(5) A method to reduce power consumption of instruction memory by loop buffering is proposed. In VLIW DSPs, a significant amount of power is consumed in instruction memories. According to the characteristics of digital processing applications whose most execution time is consumed by loops, loop buffering can be used to reduce power consumption of instruction memories while fetching instructions. We presents a low power compilation method based on compiler-controlled loop buffer where the compiler is responsible of selecting appreciate loops to be put into buffer by power analysis and determining the time to switch loop buffer. The proposal method can reduce power consumption, not degrading performance.(6) Design and realization of optimized VLIW DSP compiler. Based on retargetable compiler infrastructure IMPACT, we design and realize a VLIW DSP compiler, moreover the compiler is optimized for high performance and low power/energy consumption using above researches.

Keywords/Search Tags:

VLIW, Compiler, Complementary Predication, Modulo Scheduling, Low Power compilation, Automatic Vectorization, Loop Buffering

PDF Full Text Request

Related items

1	Research On Vectorization Technology For Multi-cluster And VLIW DSP
2	Research On Loop Distribution Technology For Shenwei GCC Compiler System
3	Complementary compiler and architecture features for embedded VLIW processors
4	Research Of Automatic Compiler Tuning Base On Machine Learning
5	Research On Aggressive Butterfly Optimization Method Based On GCC Compiler
6	Tuning an adaptive-compilation search space with loop unrolling
7	Research On Loop Vectorization In LLVM
8	Adaptive predication via compiler-microarchitecture cooperation
9	Exploiting Some Key Techniques On The Loop Unrolling In GCC Compiler
10	Some Research On The Multi-wavelength Signals Buffering System Based On Dual Loop Coupling