Mechanisms for hiding communication latency in data parallel architecture

Posted on:1998-03-22

Degree:Ph.D

Type:Dissertation

University:Georgia Institute of Technology

Candidate:Garg, Vivek

Full Text:PDF

GTID:1468390014979850

Subject:Electrical engineering

Abstract/Summary:

The goal of this dissertation is to explore techniques to improve the performance of interprocessor communication on data parallel architectures. In many cases, interprocessor communication latency is a significant fraction of the overall time required to execute data parallel applications. The lockstep model of execution used by more traditional SIMD machines does not admit any possibility of exploiting task parallelism. We will demonstrate that a shift in the SIMD paradigm to enable a small degree of task parallelism can reduce the communication overhead independent of any improvements in technology.;In this work, we identify two primary mechanisms for exploiting communication concurrency in data parallel applications: overlapping communication with computation, and overlapping communication with other communication. We propose an architectural framework, referred to as concurrently communicating SIMD (CCSIMD), to exploit communication concurrency in data parallel applications, and study three specific implementations of CCSIMD. The impact of these architectures on a suite of data parallel applications is studied. Results show that exploiting communication concurrency can lead to significant improvements in performance. For well-balanced architectures, overlapping communication with other computation offers the most benefit. In architectures with relatively stronger computational support, overlapping communication with other communication is better. Combining the two techniques can often result in better performance than employing either technique by itself.

Keywords/Search Tags:

Communication, Data parallel, Performance

Related items

1	Mechanisms for hiding communication latency in data parallel architecture
2	Research And Implementation Of Parallel RTI Based On High Performance Computing Environment
3	Parallel Program Execution Model On Data Communication Optimization
4	Performance analysis of one-dimensional fast Fourier transform on parallel systems
5	NPB Performance Evaluation Of Tera-Scale Clusters And Implementation Of Parallel Non-Numerical Algorithm With Performance Analysis
6	Research And Implementation Of Parallel Program Performance Analysis System
7	High performance virtual architecture parallel libraries with data redistribution for multicomputers
8	An empirical approach to communication and performance modeling for message passing parallel applications on cluster systems
9	The Evaluation Of Parallel Algorithm Performance As Well As The Study And Implementation Of Key Technologies Of Parallel Monitor Toolkits
10	Research On Data Parallel Communication Strategy For Distributed Machine Learning System