Font Size: a A A

Parallel query processing on a cluster-based database system

Posted on:2005-11-01Degree:Ph.DType:Thesis
University:Carleton University (Canada)Candidate:Imasaki, KenjiFull Text:PDF
GTID:2458390008987914Subject:Computer Science
Abstract/Summary:
This thesis focuses on cluster-based parallel database systems in which only one of the nodes has the database and the other nodes, which have no initial data, are used for parallel query processing. In such a system, the load of each node changes dynamically depending on the activities of the local users. In addition, in database query processing, data skew exists. Thus, it is very important to develop efficient load balancing/sharing algorithms. With the increasing Internet connectivity, it also becomes necessary to query databases spread around the world. This can incur arrival delay and/or the transfer rate fluctuations of input relations. This thesis investigates query processing algorithms (single- and multiple-join) under these conditions.; For the single-join processing, a new algorithm which divides the hash buckets into chunks and uses them for load balancing is proposed. Two versions of this algorithm are introduced: a non-symmetric join algorithm (ChunkHJ) and a symmetric join algorithm (SCHJ). ChunkHJ is compared with two other algorithms: an adaptive nested-loop join and an adaptive GRACE join. These three algorithms are evaluated on a heterogeneous cluster with skewed data and various non-query background load. The results show that the proposed ChunkHJ is the best algorithm. SCHJ is compared with a dynamic round-robin algorithm, a sampling algorithm (without the Internet transfer delay) and an incremental hash mapping algorithm (with the Internet transfer delay). The results show that without the Internet transfer delay, the SCHJ algorithm is the best among these algorithms. With Internet transfer delay, the SCHJ algorithm is still better than the other incremental hash mapping algorithms.; For the multiple-join processing, the Non-Symmetric Pipelined Hash Join (NSPHJ) and Symmetric Pipelined Hash Join (SPHJ) algorithms are evaluated with and without the Internet transfer delay. The results show that NSPHJ should be used for a cluster without the Internet transfer delay. Otherwise, SPHJ should be used.
Keywords/Search Tags:Internet transfer delay, Query processing, Database, Parallel, Results show, Algorithm, SCHJ
Related items