Research On Optimization Technologies Of Parallel Compilation For Distributed Memory Architecture

Posted on:2013-03-06

Degree:Master

Type:Thesis

Country:China

Candidate:J Zhao

Full Text:PDF

GTID:2248330395980589

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Nowadays the mainstream high performance computers all over the world are mostlyprovided with multilayer parallelizing computing resources consisting of inter-nodal, intra-nodaland SIMD short vector functional units and so on. How to utilize the parallelizing compilersystem to explore the parallelism of different levels and multi-grained existing in the practicalapplications has became an important technology to improve the application performance of highperformance computer system. For the past few years, a mass of studies on the SIMDvectorization of serial programs and intra-nodal parallelism have been developed. Thecorresponding technologies have been relatively matured. However, the development ofparallelizing compiler on the distributed memory architecture is comparatively slower. There arestill lots of problems desiderating to be studied and settled although they have acquired a greatdeal of outstanding research achievements. This thesis studied the optimization technologies ofparallelizing compiler for distributed memory architecture based on dependence analysis. Itsmain contents included:1. Introduction to the present domestic and international research condition of parallelizingcompiler system and optimization technologies. According to the development of highperformance computer architecture and its current mainstream architecture, we introduced thedomestic and international research and development present condition of parallelizing compilersystem, especially of that for the distributed memory architecture. According to the researchbasis and principle of optimization technology, the fundamentality of dependence analysis wasproposed so as to explain the research content and meaning of our work.2. Exploitation of parallelism for distributed memory architecture on the condition thatdependence existes. Dependence is one of the critical factors influencing the parallelism ofprograms. To eliminate the pseudo dependences in programs, an accurate dependence testingmethod was studied; by analyzing different kinds of dependences, we deep exploited theparallelism for distributed memory architecture on the condition that dependence existes. Themain contributions of this part included the following. By constructing the quadraticprogramming model of array subscripts, a nonlinear array subscripts dependence test wasproposed. It efficiently eliminated a part of pseudo dependences in programs. An MPIauto-parallelization method with loop-carried anti-dependence was implemented by creatingreasonable copies of dependence data.3. Research on parallelism recognition based on classifying nested loops. Whilesummarizing the parallelism recognition technologies of existing optimizing compilers andconsidering the structural characteristics of nested loops, we researched a method that canimprove the ability of parallelism recognition. Furthermore, we prevented the loops whoseparallelzing profits are substractive from being paralleliezed by utilizing the cost analysis andloop optimization technologies. The main contribution of this part can be described as below. Anovel classifying method of nested loop was presented and the automatic parallelism recognition method was implemented based on it. The loops whose parallelizing profits are subtractive byutilizing the cost analysis and loop optimization were efficiently prevented from beingparallelized, which improved the efficiency of the parallel programs.4. Array data stream analysis based communication optimization. We introduced thesignificance of the communication optimization of parallel programs for distributed memoryarchitecture, summarized the technologies of communication optimization by distinguishing thedifferences between the communication optimization before code generation and after, andconstructed the accurate array data stream by dependence analysis to exactly calculate thecommunication data. The following are the main contributions of this part. According to theinter-procedural analysis and optimization technologies, a novel communication data calculatingalgorithm was proposed by constructing the accurate array data stream, which significantlyeliminated the redundant communications of the message passing parallel programs. At the sametime, it reduced the time complexity of the communication data calculating process.5. Test and experimental verification. First, we separately test each optimization technologyproposed in this thesis, and verified the significance and effect of the research contents combinedwith the benchmark testing programs. Secondly, according to the joint test, we validated thesupport of the parallelizing compiler system for the exploitation of multi-grained parallelism andits efficiency.Finally, we summarized the research works of the whole thesis and discussed the nextresearch plan.

Keywords/Search Tags:

High Performance Computing, Parallelizing Compiler, Distributed MemoryArchitecture, Dependence, Parallelism Recognition, CommunicationOptimization, Cost Analysis, Loop Optimization, Array Data Stream, Inter-procedural Analysis and Optimization

PDF Full Text Request

Related items

1	Optimization Of Array Data-Flow Analysis And Research And Implementation Of Array Privatization Technology
2	Research On Automatic Code Generation And Optimization In Parallelizing Compiler
3	Research On SIMD Vectorization Of Loop Nests And Its Optimization Techniques
4	Research And Implementation Of Global Array Data-flow Analysis Technology
5	Loop Realization And Optimization Based On X Stream Processor
6	Research And Implementation Of Inter-Procedural Analysis For Java
7	A Quick And Generic Approach Of Selecting Compiler Optimization Options
8	High-efficiency Reconfigurable Array Computing: Architecture, Methodology And Application Mapping Technology
9	Performance Research And Optimization Of Stream Processing
10	Research On Loop Transformation Optimization Based On Domestic Shenwei Compiler