Font Size: a A A

Research On Technologies Of Data Distribution And Code Generation For Distributed Memory Architecture

Posted on:2014-09-14Degree:DoctorType:Dissertation
Country:ChinaCandidate:R DingFull Text:PDF
GTID:1268330401476876Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Nowadays the mainstream high performance computers all over the world are mostlyprovided with multilayer parallelizing computing resources consisting of distributed memoryarchitecture, shared memory architecture, SIMD short vector functional units and so on. Theyoffer support for multi-level parallel programs’ execution. How to utilize the parallelizingcompiler system to explore the parallelism of different levels and multi-grained existing in thepractical applications has became an important technology to improve the applicationperformance of high performance computer system.Since every computing node has its own memory in distributed memory parallel computers,the nodes need explicit message passing to accomplish the data exchange with each other. Thismeans that automatic parallelizing should find out both the computation partition and datadistribution of serial programs in distributed memory architecture to make each computationnode use local data as possible for the acquisition of greatest data locality and parallelism.Otherwise, because local memory access speed of the computing node is much faster thanremote memory access, the efficiency of communication code has a direct impact on the programparallel performance. For this purpose, the paralleling compiler should generate correct andefficient communication code to maintenance the data consistency.This dissertation takes parallelizing compiler SW-VEC as research background anddiscusses several problems of data distribution and code generation of parallelizing compiler onthe distributed memory in detail. The main contents and contributions are as follows:1. Array data flow analysis for data distribution. The traditional analysis methods of arraydata-flow focus on accurate dependence testing and array privatization, etc. They can’t offerdetailed information of array’s define-use inter-loops and have impact to the choice for datadistribution. In allusion to this problem, this paper proposes a new array data-flow analysismethod for data distribution, which represents array element’s data-flow of inter-loops bydefine-use graph. The algorithm offers detailed data-flow information for the optimizationselection to implement data distribution. The experimental results show that the algorithm abovemay improve the precision of data distribution algorithm’s assessment of parallel benefits andreduce the communication redundancy of code generated.2. Affine decomposition for distributed memory architecture. In order to give considerationto the parallelism for both shared memory and distributed memory, tight partition constraint incurrent affine decomposition has limited the parallel development for codes includinganti-dependence and output-dependence through automatic parallelization process. Focus on the above problem of affine decomposition, this dissertation make an improvement to the affinedecomposition by establishing the partition constraints aiming at the characteristic of privatememory under the distributed memory architecture. This improvement eliminated the redundantconstraint of anti-dependence and output-dependence under the distributed memory architectureand promoted the exploitation ability of parallel compiler.3. Array life cycle in the data decomposition. The existing research of data distribution hasnone considered of the influence of the array life cycle to data decomposition, so theinconsistency of decomposition in different array life cycles leads to communication redundancy,also impact the assessment of parallel benefits and reduce the selection of the datadecomposition. Thus, this dissertation proposes a new data decomposition algorithm based onarray life cycle, it creates own decomposition for each life cycle of array by using decompositionmapping table and eliminates the useless association of data decomposition in different life cycle.So, the times and the communication cost in decomposition of the data redistribution aresignificantly reduced.4. The data decomposition optimization based on dominant value. The data distributionresult of the array which migration is the most expensive has a direct impact on the suitability ofdata decomposition. A data decomposition optimization algorithm based on dominant value ispresented in this dissertation. The algorithm quantifies the impact of array to the programs’parallelism as the dominant value to guide the priorities of distribution, then find the datadistribution strategy of array with higher dominant value first, and push back distribution ofothers array to reduce their constraints to the dominant array. The communication redundancy ofdominant array’s redistribution is efficiently reduced.5. The generation and optimization for accurate communication code. Traditional researchresolve communication mainly by generating code that each processor sends all data defined inits local memory to all processors, but this will bring large amount of communicationredundancy, which increase with the growth of number of processors. Focus on this problem, thisdissertation proposes an accurate communication code generation algorithm, which determinessource processor and goals processor of each array element’s migration in array redistribution bythe transformation of data decompositions, then generate accurate communication code.Experimental results show that the algorithm can eliminate the communication redundancyusefully and improve the performance of paralleling program automatic generated.6. The data distribution and code generation of irregular problem. Many large-scalescientific applications contain irregular loops. But the prior work of automatic parallelization ondistributed memory is hardly to partition the data of irregular problem and generate availabilitycommunication code at compile-time. In this dissertation, we propose an approach for effective data distribution and code generation for a common class of irregular problem contains affineparallelizable loop. The approach distribute data accessed by irregular arrays onto each processorby computation decomposition and access expression of array references, and satisfy theproducer-consumer relation of irregular array references by redundancy replication technique.The experimental results verify the validity of the approach and improve the speedup of testapplications.The algorithm presented in the thesis has been carried out and applied in the SW-VECsystem, and the validity has been proved.
Keywords/Search Tags:Distributed Memory Architecture, Paralleling Compiling, Data Distribution, CodeGeneration, Array Life Cycle, Accurate Communication, Dominant Array
PDF Full Text Request
Related items