Font Size: a A A

Research And Implementation Of Key Technologies For Matrix2 DSP Compilation Optimization Based On GCC

Posted on:2015-12-16Degree:MasterType:Thesis
Country:ChinaCandidate:X S TuFull Text:PDF
GTID:2308330479479189Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Matrix2 DSP is a high-performance 64-bit floating-point digital signal processor with independent intellectual property. It is designed by the Department of Microelectronics in Computer School, National University of Defense Technology. Mainly used in weather forecast, graphics and image processing and so on, Matrix2 DSP has specific characteristics of strong data-computing ability, high-speed operation and powerful parallel processing. In order to support the high-level language programming aiming at the Matrix2 DSP, our research group developed Matrix2 DSP compiler based on the open source compiler GCC-4.7.0.With the architecture of VLIW, Matrix2 DSP’s the computing capability largely depend on the performance of the compiler. Taking the architecture and the trait of the instruction set of Matrix2 DSP into account, this paper optimized the Matrix2 compiler on three aspects as follows: candidate function-unit allocating algorithm, branch-delay-slot scheduling algorithm, and the mapping of the irregular instruction. All these work enhances the Matrix2 compiler a lot. The main content and contribution of the paper are as follows:Designing and realizing of the candidate function-unit allocating algorithm for the Matrix2 DSP compiler. The innovated instruction set of Matrix2 DSP requires the compiler to allocate the appropriate execution units of instruction from the candidate function-unit. Employing instruction-constraint-matching mechanism as basis, this paper choose instruction word to be the fundamental allocate unit, synthetically considers the proper function-unit status for the current instruction, and the situation of the unoccupied function-units. Then it implements the particular allocating rule for the Matrix2 DSP compiler. The particular algorithm of the candidate function-unit allocation strengthens the performance of the work in GCC on the same section, contributes to mining instruction-level parallelism of compiler, and improves the utilization ratio of the hardware and the performance of executing the program.Designing and realizing the branch-delay-slot scheduling optimization algorithm for the Matrix2 DSP compiler. There are six delay slots in each branch instruction, jump instruction, call instruction and return instruction of the Matrix2 DSP instruction set, therefore, filling delay slots to the maximization extent makes a great contribution to improving processor performance. On the foundation of the delay-slot scheduling algorithm from GCC, this paper proposes the optimized branch-delay-slot scheduling algorithm which mainly concerns on modifying the area’s search of the available instruction to be filled, broadening the instruction restriction of the delay slot, and implementing the function of branch-delay-slot scheduling. These optimizations are realized in Matrix2 DSP compiler. The implementation of branch-delay-slot scheduling optimization algorithm improves the rate of filling branch delay slot and effectively reduces the branch penalty caused by the branch.Designing and realizing the mapping of the irregular instructions for Matrix2 DSP compiler. A large number of irregular instructions with distinct operands are put forward by the instruction set, while the current edition of GCC can’t support their mapping. Based on the instruction mapping mechanism in GCC, and taking the feature of the irregular instruction into consideration, this paper modifies the consistency examination rule of the arithmetic operation and the related transforming rule, which are put forward by standard C language, and achieves the support for the mapping of the irregular instruction on RTL-instruction expand mechanism. Finally, the irregular instructions get supported by Matrix2 DSP compiler and are enabled to map correctly and efficiently in the compiler.
Keywords/Search Tags:Candidate functional unit allocation, Delay slot scheduling, Irregular instruction mapping, Instruction constraints match
PDF Full Text Request
Related items