Parallel query processing on a cluster-based database system

Posted on:2005-11-01

Degree:Ph.D

Type:Thesis

University:Carleton University (Canada)

Candidate:Imasaki, Kenji

Full Text:PDF

GTID:2458390008987914

Subject:Computer Science

Abstract/Summary:

This thesis focuses on cluster-based parallel database systems in which only one of the nodes has the database and the other nodes, which have no initial data, are used for parallel query processing. In such a system, the load of each node changes dynamically depending on the activities of the local users. In addition, in database query processing, data skew exists. Thus, it is very important to develop efficient load balancing/sharing algorithms. With the increasing Internet connectivity, it also becomes necessary to query databases spread around the world. This can incur arrival delay and/or the transfer rate fluctuations of input relations. This thesis investigates query processing algorithms (single- and multiple-join) under these conditions.; For the single-join processing, a new algorithm which divides the hash buckets into chunks and uses them for load balancing is proposed. Two versions of this algorithm are introduced: a non-symmetric join algorithm (ChunkHJ) and a symmetric join algorithm (SCHJ). ChunkHJ is compared with two other algorithms: an adaptive nested-loop join and an adaptive GRACE join. These three algorithms are evaluated on a heterogeneous cluster with skewed data and various non-query background load. The results show that the proposed ChunkHJ is the best algorithm. SCHJ is compared with a dynamic round-robin algorithm, a sampling algorithm (without the Internet transfer delay) and an incremental hash mapping algorithm (with the Internet transfer delay). The results show that without the Internet transfer delay, the SCHJ algorithm is the best among these algorithms. With Internet transfer delay, the SCHJ algorithm is still better than the other incremental hash mapping algorithms.; For the multiple-join processing, the Non-Symmetric Pipelined Hash Join (NSPHJ) and Symmetric Pipelined Hash Join (SPHJ) algorithms are evaluated with and without the Internet transfer delay. The results show that NSPHJ should be used for a cluster without the Internet transfer delay. Otherwise, SPHJ should be used.

Keywords/Search Tags:

Internet transfer delay, Query processing, Database, Parallel, Results show, Algorithm, SCHJ

Related items

1	Parallel Query Processing Techniques In Parallel Database System PBASE/2
2	Research On Key Techniques Of Query Processing In Spatio-Temporal Databases
3	Parallel Processing And Optimization Of Database Query Under Distributed Architecture
4	Object-oriented Databases Parallel Query Processing And Transaction Management,
5	The Research Of Fuzzy Query Results Ranking Based On Relational Database And Its Application In Realestate Information Search System
6	Research On Key Techniques Of Parallel Multiple Query Processing
7	The Research Of Personalized Categorization Approach For Web Database Query Results
8	Research Of Main Memory Database Parallel Query Processing On SMP
9	Parallel Database Query Optimization Genetic Algorithm
10	The Research And Implement Of Parallel Query Over Massive Data On Multi Database