Font Size: a A A

Optimizations Of Memory-access For Stencil Computations On Shared-memory Multi-core Processor

Posted on:2016-04-17Degree:MasterType:Thesis
Country:ChinaCandidate:Y S DongFull Text:PDF
GTID:2348330536467716Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Shared-memory multi-core and multi-level Cache architectures have been widely used in high performance computing.Although Multi-level Cache have been proved to be effective in alleviating the “memory wall”,the efficiency of memory access is still low for scientific programs because of the large number of memory accessing instructions.Besides,multi-core parallelism of the programs has higher requirements for memory bandwidth.Hence,reducing frequency and hiding latency on memory access is the focus of research on memory access optimizations.Stencil computations are an important class of memory intensive computing kernels used in a variety of application domains ranging from image and video processing to simulation and computational science applied in several areas of natural science.Recently,Stencil computations have become the target of optimization for more and more researchers,such as parallelism,communication,and load balancing,but the research on memory access optimizations also needs further research.This dissertation focuses on the optimizations of memory access for stencil optimization on SMP platforms,including loop tiling,vector permutation and data prefetching.The main contribution and innovation of this dissertation are as follows:1.The blocking method of loop tiling is improved,and a parallel algorithm based on data block binding to OpenMP thread is proposed.The improved blocking method synthetically considers the parallel of multi-core or multi-thread and the structure of multi-level Cache of the platform.The new parallel algorithm can not only solve the problem of high parallel overhead in traditional parallel algorithm effectively,but also take full advantage of the reusing data between every two adjacent blocks.2.Vector permutation is used to reduce the redundant memory access in vectorized stencil computations,and a method of vector permutation based on splicing and shifting is proposed.Considering the memory access' s specialties of stencil computations,the optimization of SIMD is feasible,and some data elements are reused among some vector.Hence,Vector permutation is used to reduce the count of store/load and promote the efficiency on memory access.Besides,most methods of vector permutation are proposed for stencil computations.Especially,the method based on splicing and shifting can effectively decrease the number of vector operation.3.Data prefetch is used to hiding the memory access latency in stencil computations.Data prefetch takes full advantage of the idle bandwidth to access the data,and hide the latency by overlapping memory accesses with computation significantly.By analyzing the mechanism of hardware prefetch and software prefetch on Intel X8664 platform,software data prefetch is used to optimize stencil computations on both continuous and discontinuous mode of memory access.Otherwise,loop unrolling and loop peeling are used to optimize the software data prefetch.The experiment results show that software data prefetch is beneficial to the stencil computations with discontinuous memory access.
Keywords/Search Tags:Stencil Computation, SMP, Multi-level Cache, Loop Tiling, SIMD, Vecto8r Permutation, Data Prefetch
PDF Full Text Request
Related items