Loop unrolling along wavefronts and wavefront based techniques for exploiting instruction and thread level parellelism

Posted on:2014-03-12

Degree:Ph.D

Type:Thesis

University:Santa Clara University

Candidate:Steinbrecher, Johann

Full Text:PDF

GTID:2458390008455971

Subject:Engineering

Abstract/Summary:

Loop unrolling is a loop transformation where a few loop iterations are grouped as a super iteration to have more independent instructions that feed multiple pipelines in a system, to decrease the total loop overhead or explore data locality by reusing loaded data. In this thesis, loop unrolling is characterized by the unrolling factor, the number of iterations in a super iteration and the unrolling direction, the choice of iterations to be grouped to form the super iteration. Data dependencies exist between different iterations. To increase the number of independent instructions in the super iteration, independent iterations should be grouped to form the super iteration. In most previous work, loops are unrolled along the inner loop, which may group dependent iterations together and not be the best unrolling direction. This thesis uses a linear schedule to group independent iterations on the same wave front, a hyper-plane in the loop iteration space. Then, the loop is unrolled along the wavefront which guarantees all iterations in the same super iteration are independent. The selection of the optimal unrolling factor is based on the assumption that if all the pipelines are saturated, i.e., there is no bubble, the performance should not be bad. Two necessary conditions and a sufficient condition for optimality are presented and used to find the optimal unrolling factor for systems with low memory access latencies. The total execution time is expressed as a function of algorithm parameters such as the data dependence structure and instruction counts, architecture parameters such as the number of pipelines and number of registers and the unrolling factor. An example, the longest common subsequence problem is used to demonstrate the method. An implementation of the proposed unrolling on the longest common subsequence problem scores a speedup of 1.14 on an IntelRTMAtom(TM) micro-architecture over traditional methods. Furthermore, it is presented how loop unrolling along wavefronts can aim in partitioning a loop nest in multiple threads. A sufficient condition for creating communication free threads is presented and applied. Additionally, new implementation strategies for supernode transformations based on the concept of wavefronts are presented.

Keywords/Search Tags:

Unrolling, Loop, Super iteration, Wavefronts, Iterations, Presented

Related items

1	Research On Inner Loop Unrolling Method For Vector DSP
2	Exploiting Some Key Techniques On The Loop Unrolling In GCC Compiler
3	Iterative Loop Unrolling Optimizing For Embedded Systems
4	Program speedup through generalized loop unrolling
5	Research On Loop Transformation Optimization Based On Domestic Shenwei Compiler
6	Loop Realization And Optimization Based On X Stream Processor
7	Design And Implementation Of The Software Pipelined Loop Buffer For The Innovative X DSP
8	The Presented Image Of Spain Presented In The Chinese Share Video Websites
9	Tuning an adaptive-compilation search space with loop unrolling
10	Decomposition and fast distributed iterations for optimization of networked systems