Font Size: a A A

Complete overflow management algorithm to join large relational tables

Posted on:2003-02-23Degree:Ph.DType:Dissertation
University:North Dakota State UniversityCandidate:Goli, Venkata Nagarjuna RaoFull Text:PDF
GTID:1468390011479292Subject:Statistics
Abstract/Summary:
A new hash based algorithm, the Complete Overflow Management join Algorithm, (COMA), was proposed as a resolution for collisions in hash table and main memory bucket overflows during hash join processing. Collisions and bucket overflows are caused by skew in the data. Changes in the available main memory size cause bucket overflows. Collisions and bucket overflows during hash join result in excessive disk I/O. Traditional join algorithms, such as Join Indices, Hybrid hash, and Domain Vector hash, assume that the data distribution is normal. They collapse when the hashing algorithm employed affects data distribution. COMA sets aside a page of memory to copy the overflowing records. In addition, COMA employs orthogonal hashing functions to avoid collisions and rehashes the records to avoid bucket overflows. In order to accommodate fluctuations in the main memory size, COMA chooses the smaller of the build or probe buckets. This selection process, in turn, facilitates merging of multiple buckets. A cost model of the algorithm was presented. Implementation aspects of the algorithm were discussed. COMA was implemented, and the performance was compared with other similar join algorithms using the Wisconsin benchmark for varying parameters. Analysis results show that, when skew and kurtosis are high, COMA is the best. Specific proposals were made for extending COMA to applying with configurators, in data warehouses, multi-level join queries, mobile databases, and statistical databases to achieve better performance.
Keywords/Search Tags:Join, COMA, Algorithm, Hash, Bucket overflows, Collisions, Data
Related items