Font Size: a A A

Research And Implementation Of Software Pipelining Framework For BWDSP104X

Posted on:2017-05-21Degree:MasterType:Thesis
Country:ChinaCandidate:L T HongFull Text:PDF
GTID:2308330485951848Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Most modern high-performance digital signal processor user VLIW architecture, which can launch multiple instructions in the same clock cycle, aims to obtain higher performance and discover instruction-level parallelism of the target machine. Programs can take advantage of the processor hardware resources through compiler optimization which make challenges to backend optimization of compiler. Loop module is generally the most time-costing in program, which make loop optimization so important to improve performance, including vectorization, loop unrolling, predicated optimization, software pipeline and so on.Our research based on the compiler named Open64, which is an open-source project with GNU license. The compiler is a perfect research platform because it has clear code modules and comprehensive backend optimized design. Currently Open64 has achieved software pipelining optimization for the target of IA64, which has a detailed overall process and is valuable to our research.The main task of the thesis is the implementation of software pipelining optimization for BWDSP104X on Open64 platform. We start with loop selection, too small or irregular loop will not be selected to perform software pipelining optimization. Secondly, calculate the minimum initiation interval according to the resource dependence based on machine description and data dependence calculated by data dependence graph. Thirdly, modulo scheduling algorithm can be started through initiation interval and modulo resource table. Lastly, the BWDSP104X object code can be obtained by modulo variable expansion and register allocation. For the purpose of getting maximize performance of software pipeline we should take full advantage of multi-cluster resources. So the thesis puts forward instruction clustering, instruction scheduling, and software pipeline with multi-cluster architecture. But the performance will still be affected if branch statement exists in loop. There have predicated instructions and registers in our processor, which provide hardware support to the predicated optimization. Experimental results show that multi-cluster software pipeline combined with predicated optimization produce better performance improvement to loop program running on BWDSP104X multi-cluster processor.
Keywords/Search Tags:Compiler, Software pipelining, Iteration interval, Module-Scheduling, Register allocation, Multi-Clustcr optimization, Predicate optimization
PDF Full Text Request
Related items