Font Size: a A A

Improving the Effectiveness of Searching for Isomorphic Chains in Superword Level Parallelis

Posted on:2018-01-30Degree:Ph.DType:Dissertation
University:North Carolina State UniversityCandidate:Huh, JoonmooFull Text:PDF
GTID:1449390005451693Subject:Computer Engineering
Abstract/Summary:
Most high-performance microprocessors come equipped with general-purpose Single InstructionMultipleData (SIMD) execution engines to enhance performance.Compilers use auto-vectorization techniques to identify vector parallelism and generate SIMD code so that applications can enjoy the performance benefits provided by SIMD units. Superword Level Parallelism (SLP), one such vectorization technique, forms vector operations by merging isomorphic instructions into a vector operation and linking many such operations into long isomorphic chains. However, effective grouping of isomorphic instructions remains a key challenge for SLP algorithms.;In this work, we describe a new hierarchical approach for SLP. We decouple the selection of isomorphic chains and arrange them in a hierarchy of choices at the local and global levels. First, we form small local chains from a set of preferred patterns and rank them. Next, we form long global chains from the local chains using a few simple heuristics. Hierarchy allows us to balance the grouping choices of individual instructions more effectively within the context of larger local and global chains, thereby finding better opportunities for vectorization.;We implement our algorithm in LLVM, and we compare it against prior work and the current SLP implementation in LLVM. A set of applications that benefit from vectorization are taken from the NAS Parallel Benchmarks and SPEC CPU 2006 suite to compare our approach and prior techniques. We demonstrate that our new algorithm finds better isomorphic chains. Our new approach achieves an 8.6% speedup, on average, compared to non-vectorized code and 2.5% speedup, on average, over LLVM-SLP. In the best case, the BT application has 11% fewer total dynamic instructions and achieves a 10.9% speedup over LLVM-SLP.;We also propose a new mathematical approach to figure out the optimal selections for SLP. First, we assign 0-1 integer variables to each possible isomorphic seed in a basic block. Next, we design an objective function with constraints of the variables. The objective function represents the cost of seed selections.;We implement the 0-1 integer programming in LLVM, and we compare our optimal seed selections with the one from current SLP pass in LLVM. The small basic blocks of applications from the NAS Parallel Benchmarks are evaluated.We find that, most of time, the 0-1 integer programming provides the same selections with SLP pass in LLVM for the small basic blocks. This shows that the heuristics of current SLP in LLVM works well for the small basic blocks.
Keywords/Search Tags:SLP, Isomorphic chains, LLVM, Small basic blocks, SIMD
Related items