Font Size: a A A

Performance of the distributed hash join algorithms in a distributed heterogeneous supercomputing environmen

Posted on:1996-08-15Degree:Ph.DType:Thesis
University:Temple UniversityCandidate:Khan, Zahira SaleemFull Text:PDF
GTID:2468390014988677Subject:Computer Science
Abstract/Summary:
This dissertation is based on the notion that the performance of a parallel algorithm as a whole depends on the performance of the different components of the algorithm. However, these components may be better suited to different architectural models. It is hypothesized that performance improvements may be achieved by use of distributed algorithms that execute the components on the architectures best suited for them.;In order to test this hypothesis, hash join algorithms that implement the equijoin operation of relational databases are designed and implemented on a distributed heterogeneous environment consisting of the Cray C90 and the Connection Machine (CM-2). The non pipelined (DHJ) and Pipelined Distributed Hash Join (PDHJ) algorithms execute the hash phase on the Cray C90 and the join phase on the CM-2.;The results of the experiments, using one processor on the Cray C90 and 8192 processors on the CM-2, indicate that the Cray C90 is better suited to the hash phase while the join phase is suited to the CM-2 for some combinations of the data size, data distribution, join selectivity ratios.;Performance of algorithms is sensitive to many issues dealing with the architecture, software, and problem specifications. In chapter four a procedure for normalization of timings due to differences in processor speeds is developed and validated. Performance comparisons based on normalized and actual times are reported.;In order to select an architecture for any problem specification a regression model is developed for each phase of the algorithm on each computing environment. Chapter eight shows that the predicted results are close to the observed values for some experiments and points out the usefulness of this model in selecting the best model, from the architectures considered here, for a given problem specification.;The performance results obtained for the DHJ algorithm for some combinations of data distribution, size of relations, and join selectivity ratios provide sufficient evidence to support the hypothesis of the dissertation. Because of the small number of experiments conducted for the PDHJ algorithm, it is difficult to conclude that performance improvements will occur as a result of distributing the tasks on the best suited architecture.
Keywords/Search Tags:Performance, Algorithm, Hash join, Cray C90, Distributed, Suited, CM-2
Related items