Research On Automatic Parallelization And Optimization Technologies For Shared Memory Architecture

Posted on:2014-07-27

Degree:Doctor

Type:Dissertation

Country:China

Candidate:X X Liu

Full Text:PDF

GTID:1268330401476864

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

The peak speeds of high performance computer are pushed to new pinnacle with the use and innovations of parallel architectures again and again. However, the difficulty of programming parallel architectures is a huge challenge for programmers. Among several approaches to address the parallel programming problem, one that is very promising but simultaneously very challenging is automatic parallelization. Automatic parallelization is the process of automatically converting a sequential program to a version that can directly run on multiple processing elements without altering the semantics of the program. This process requires little effort from programmers and is therefore very attractive.Shared memory architectures always play an important role during the course of high performance architecture development. The automatic parallelization techniques for shared memory architecture have been widely studied in the domain of scientific and numeric applications, but there are still many challenges in automatic generation of high performance parallel programs for shared memory architectures, such as parallelization of loops with inter-iteration dependences, cost model used for profit estimation and data transfer optimization for heterogeneous platform.This dissertation, based on the development and research of an automatic parallelizing compiler SW-VEC, focuses on the issues of the automatic parallelization and optimization technologies for shared memory architecture; the main contribution and innovation of this dissertation are as follows:1. A pipelining granularity optimizing algorithm based on DOACROSS cost model is proposed to obtain the optimal pipelining granularity, and an automatic pipelining parallelization algorithm for regular DOACROSS loops based on OpenMP is proposed for shared memory parallel platforms. By using above algorithms, automatic generation of effective pipelining code is implemented.2. An improved PS-DSWP algorithm based on OpenMP is proposed, which is implemented without relying on CPU architectures by using a high level intermediate representation. Moreover, the Program Dependence Graph (PDG) used in the algorithm is built based on the basic blocks, which exploits coarser-grained parallelism than the original PS-DSWP transformation with PDG based on instructions. OpenMP is employed in our algorithm to assign task and implement synchronization among threads while avoiding platform limitation.3. A compile-time cost model for automatic parallelization profit estimation based on a modularized and hierarchized strategy is built, which is partitioned into two hierarchies. The major benefit of the modularized and hierarchized strategy is that the design and implementation could be easily achieved and the resulting model should be flexible and extensible.4. An approach of managing data storage and data transfer between the main memory and local memory is proposed by designing a potential extension to OpenMP. Meanwhile, blocking regular array region and its union operation are defined to describe the set of transferred array data, and develop a method to compute the array region by applying the polyhedral model.The algorithms and cost model presented in the dissertation have been carried out and applied in the SW-VEC system, and the validity has been proved.

Keywords/Search Tags:

Parallel Compilation, Shared Memory Architecture, OpenMP Programming Model, Pipelining Parallelization, PS-DSWP Parallelization, Cost Model, Data TransferOptimization

PDF Full Text Request

Related items

1	Research On Cost Model And Memory Optimization Techniques For Heterogeneous Processor
2	The Research Of Parallelization Technique Based On Shared Memory Structure And Optimization
3	The Research Of Program Automatically Parallel Auxiliary System Based On SUIF Platform
4	Study On Parallel Programming Models
5	Programming model and execution model for OpenMP on the Cyclops-64 manycore processor
6	OpenMP Cost Model For Heterogeneous Structures And Loop Interchange Based On Profitability
7	Research On Key Optimization Technology Of Image Programming Based On SIMD Platform Adaptive Programming Model
8	Research On OpenACC-based Automatic Parallelization Technology
9	Research On Compilation And Optimization For OpenMP Programs
10	Parallelization Of A Fabric Defect Segmentation Method Using OpenMP