| With the development of multimedia and scientific computing fields,Single Instruction Stream Multiple Data Stream(SIMD)expansion components are widely supported and perfected by processors,and they play an increasingly important role in accelerating data flow applications.The automatic vectorization method is an important means for SIMD expansion components to achieve vectorization,and the most popular method is Superword Level Parallel(SLP).A Vectorizer Generator for SIMD and Beyond(VEGEN)uses more fine-grained Lane Level Parallelism(LLP)than SLP,and the LLP algorithm can identify and catch SLP unsupported parallel type implemented by non-SIMD vector instructions,with a wider range of vectorized application scenarios.However,through performance analysis,it is found that VEGEN has a huge search space for vectorized instruction candidate group selection,and the strategy of identifying vector instruction matching patterns is relatively conservative,and it does not consider issues such as efficient data reorganization and layout between vectorized instructions.VEGEN’s advantages have limitations.In order to improve VEGEN’s automatic vectorization capability,in response to existing problems,this thesis first proposes a screening method for vectorized instruction candidate group enumeration.This method is used for the case where there is intersection between candidate groups of different vectorization lengths.The intersection of vectorization length is the longest candidate group other than other combinations,thereby reducing the search space in the candidate group selection process;Secondly,based on this improvement,a deconstruction optimization method for the "read after read" dependency of load instructions is proposed.This method is aimed at load instructions in Static Single Assignment Form(SSA)that access the same memory address.Relying on the Definition-Use Chain(DUC)and the Use-Definition Chain(UDC)to perform dependency analysis on such load instructions,and analyze the dependencies between qualified load instructions.Destructuring to implement vectorization of scalar instructions that load the same address;Finally,an optimization method for the reorganization and layout of vectorized instruction data is proposed.This method is based on the vectorized layout of load instructions and the vectorized layout of store instructions through comprehensive comparison.For the data layout situation,choose a more optimal vectorization layout to reduce the vectorization performance loss caused by additional data reorganization.The correctness and effectiveness of the above optimization methods are verified on the TSVC2 benchmark test set.The experimental results show that the vectorized instruction candidate group enumeration screening method,the load instruction "read after read" dependency relationship deconstruction optimization method and the vectorized instruction are adopted.After the optimization method of data reorganization layout,the performance acceleration ratio of test cases on TSVC2 has been effectively improved.Compared with LLVM 11.0.0,the average acceleration ratio of vectorization performance reaches 19.85%,and the expected correct instructions and results are generated.Shows that the optimization strategy in this article effectively improves VEGEN’s automatic vectorization ability. |