Font Size: a A A

Research On DSP Automatic Vectorization And Optimization

Posted on:2014-08-30Degree:MasterType:Thesis
Country:ChinaCandidate:W Y SuoFull Text:PDF
GTID:2268330401976782Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As the components of the Single Instruction Multiple Data(SIMD) Extensions is widelyused these days, it is currently known that the microprocessor SIMD extensions that canaccelerate application has become an important method to improve the performance ofhigh-performance computers. As the multimedia and digital signal processing applicationsbecome more and more popular, many of them with embedded processors provide the SIMDinstructions. How to take full advantage of these SIMD instructions and maximize the usage ofcharacteristics of the Digital Signal Processor(DSP)hardware has become a great concern for thecurrent SIMD automatic vectorization.In this thesis, it takes Xtensa with DSP architecture as the experimental platform and adoptsthe improved Superword Level Parallelism(SLP) algorithm. This thesis is aimed to optimize forthe unique features of the DSP architecture as well as the automation to quantify a variety ofmeans fusion. It takes full advantage of SIMD instructions and achieves the greatest optimizationSLP algorithm based on the instruction and redundancy optimization techniques. Meanwhile across basic block transform programs is designed so as to overcome the obstacles that for theSLP there is sufficient parallelism that cannot be excavated. Additionally, the research isconducted for the software pipelining to quantify the technical implementation of multiple loopsand quantify the basic block to the quantized transform.The main contents of the thesis include:First: analyze SIMD instructions with specific DSP architecture, research and test based onOpen64vectorization algorithm, suppose DSP automatic vectorization technology architecturebased on an Open64platform so that the DSP architecture-specific features and automaticvectorization methods can be mixed for better fusion. Therefore, it can optimize to the greatesteffect.Second: given the fact that most of the existing commercial compiler does not apply toautomatic vectorization supporting DSP architecture, a few revise and modification have beenmade to optimize the techniques based on Open64vectorization algorithm improvements. Thefollowing aspects are involved: Firstly: for the non-continuous and non-alignment memoryaccesses, shift restructuring instruction aligned memory access to alternative non-aligned toquantify memory access instructions; Secondly: since there is redundant load/store memoryaccess operation between the basic blocks, by analysis, it can identify each basic block and itsredundant load/store to delete the read or write data on the same position; Thirdly: the rear endof the compiler is added with the statement to the quantization loop invariants and merger themention and instructions that can reduce the number of executions of the statement with the aim to optimize the performance of the program.Third: generally speaking, it is only for a single basic block in the quantization process ofthe existing compilers. It cannot be realized across the boundaries of the basic blocks mapped tothe quantizer. Due that the sufficient parallelism in the basic block cannot be excavated, itproduce large amounts of redundant operations that may severely affect the performance of theprogram. In order to come out with a cross basic block transformation and circulationdistribution of the SLP optimization algorithm, great efforts are made to explore the parallelismof the statement in the basic block of the innermost loop so that the SLP can automaticallygenerate compiler vectorization with more SIMD instructions to quantify code.Fourth: since there is more iterations in the inner loop during the process of vectorization,all the SLP algorithm is only designed for the innermost loop vectorization. However, in thepractical application since the data of the ring is non-aligned or reduction hinder vectorization, itcan result in a loss of performance due to the presence of the inner loop. Therefore, it is proposedto quantify the technology based on software pipelining, innermost loop analysis, and applicationof loop unrolling technology to expand the inner loop, break its dependence ring and reduce theimpact of the quantitative performance factors so as to operate overlapping execution in theadjoining loop.Finally, we summarized the research works of the whole thesis and discussed the nextresearch plan.
Keywords/Search Tags:DSP, SIMD, SLP, Redundancy Optimization, Across Basic Block Transformation, Software Pipeline
PDF Full Text Request
Related items