Font Size: a A A

Performance Portable Tracking of Evolving Surfaces

Posted on:2012-11-22Degree:Ph.DType:Dissertation
University:Carnegie Mellon UniversityCandidate:Yu, WeiFull Text:PDF
GTID:1458390011456637Subject:Engineering
Abstract/Summary:
Our goal is to deliver high performance portable implementations across mainstream multicore machines for an important application: tracking evolving surfaces. Level set is a widely used numerical algorithm for tracking evolving surfaces, which embeds the surface in a regularly discretized volume and performs numerical computation in the volume. The narrow band level set algorithm is a variation of the level set method that reduces the computational cost by tracking the evolution in a narrow band, which is a neighborhood region around the evolving surface. Computationally, it performs numerical stencil computation on the points in the narrow band, and tracks the motion of the narrow band in the meantime. The narrow band level set algorithm is featured by abundant data parallelism and temporal data reuse. However, code optimization for the algorithm is complicated by the irregularity of the dynamically evolving sparse band and the data dependant control flow inherent in the algorithm.;We propose a novel code transformation for the narrow band level set algorithm, namely projective time skewing. This technique effectively extracts data reuse with low overhead. It is essentially different from applying the existing time skewing technique to the algorithm, and is much more efficient. In addition, we applied a set of other code transformations to fully utilize state-of-the-art multicore CPUs. These include in-core stencil optimization, lower level memory optimizations and parallelization, with focus on utilizing in-core resources, reducing memory access pressure, and achieving sealable performance across multicores. We incorporate all these optimizations in a parameterized framework, and build an auto-tuner on top of it to automatically search for good parameter values on different machines. The auto-tuning approach enables us to deliver performance portable code on different machines without manual re-optimization.;We did experiments on two Intel x86 cache-based multicore CPUs: a Intel dual-socket 2.8 GHz Xeon 5560 and a 1.6 GHz Atom N270. They exhibit significant variance in their core micro-architectures, DRAM bandwidth, peak flop rates, and the number of hardware threads. Fully tuned code shows up to 195x speedup over a straight forward C code implementation on the dual-quadcore Xeon system. The computational rate reaches 26% to 35% of the machine peak flop rate on Xeon, and 12% to 20% on Atom, across a wide range of problem sizes. The optimal parameters found by our auto-tuner averagely provide a performance gain of 12% on Xeon and 17% on Atom over an educated guess of good parameterizations.
Keywords/Search Tags:Performance, Evolving, Tracking, Narrow band level set algorithm, Xeon
Related items