Font Size: a A A

Research On Automatic Parallelization And Optimization Technologies For Shared Memory Architecture

Posted on:2014-07-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:X X LiuFull Text:PDF
GTID:1268330401476864Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The peak speeds of high performance computer are pushed to new pinnacle with the use and innovations of parallel architectures again and again. However, the difficulty of programming parallel architectures is a huge challenge for programmers. Among several approaches to address the parallel programming problem, one that is very promising but simultaneously very challenging is automatic parallelization. Automatic parallelization is the process of automatically converting a sequential program to a version that can directly run on multiple processing elements without altering the semantics of the program. This process requires little effort from programmers and is therefore very attractive.Shared memory architectures always play an important role during the course of high performance architecture development. The automatic parallelization techniques for shared memory architecture have been widely studied in the domain of scientific and numeric applications, but there are still many challenges in automatic generation of high performance parallel programs for shared memory architectures, such as parallelization of loops with inter-iteration dependences, cost model used for profit estimation and data transfer optimization for heterogeneous platform.This dissertation, based on the development and research of an automatic parallelizing compiler SW-VEC, focuses on the issues of the automatic parallelization and optimization technologies for shared memory architecture; the main contribution and innovation of this dissertation are as follows:1. A pipelining granularity optimizing algorithm based on DOACROSS cost model is proposed to obtain the optimal pipelining granularity, and an automatic pipelining parallelization algorithm for regular DOACROSS loops based on OpenMP is proposed for shared memory parallel platforms. By using above algorithms, automatic generation of effective pipelining code is implemented.2. An improved PS-DSWP algorithm based on OpenMP is proposed, which is implemented without relying on CPU architectures by using a high level intermediate representation. Moreover, the Program Dependence Graph (PDG) used in the algorithm is built based on the basic blocks, which exploits coarser-grained parallelism than the original PS-DSWP transformation with PDG based on instructions. OpenMP is employed in our algorithm to assign task and implement synchronization among threads while avoiding platform limitation.3. A compile-time cost model for automatic parallelization profit estimation based on a modularized and hierarchized strategy is built, which is partitioned into two hierarchies. The major benefit of the modularized and hierarchized strategy is that the design and implementation could be easily achieved and the resulting model should be flexible and extensible.4. An approach of managing data storage and data transfer between the main memory and local memory is proposed by designing a potential extension to OpenMP. Meanwhile, blocking regular array region and its union operation are defined to describe the set of transferred array data, and develop a method to compute the array region by applying the polyhedral model.The algorithms and cost model presented in the dissertation have been carried out and applied in the SW-VEC system, and the validity has been proved.
Keywords/Search Tags:Parallel Compilation, Shared Memory Architecture, OpenMP Programming Model, Pipelining Parallelization, PS-DSWP Parallelization, Cost Model, Data TransferOptimization
PDF Full Text Request
Related items