Font Size: a A A

Hash-based data structures for extreme conditions

Posted on:2009-02-02Degree:Ph.DType:Thesis
University:Harvard UniversityCandidate:Kirsch, Adam LavittFull Text:PDF
GTID:2448390005960316Subject:Computer Science
Abstract/Summary:
This thesis is about the design and analysis of Bloom filter and multiple choice hash table variants for application settings with extreme resource requirements. We employ a very flexible methodology, combining theoretical, numerical, and empirical techniques to obtain constructions that are both analyzable and practical.; First, we show that a wide class of Bloom filter variants can be effectively implemented using very easily computable combinations of only two fully random hash functions. From a theoretical perspective, these results show that Bloom filters and related data structures can often be substantially derandomized with essentially no loss in performance. From a practical perspective, this derandomization allows for a significant speedup in certain query intensive applications.; The rest of this work focuses on designing space-efficient, open-addressed, multiple choice hash tables for implementation in high-performance router hardware. Using multiple hash functions conserves space, but requires every hash table operation to consider multiple hash buckets, forcing a tradeoff between the slow speed of examining these buckets serially and the hardware complications of parallel examinations. Improving on previous constructions, we show that a small Bloom filter-based data structure in fast memory can essentially allow us to use multiple hash functions while only examining a single bucket during a hash table operation.; For scenarios where we can afford the parallelization above, the space utilization of standard multiple choice hash table constructions can be improved by allowing items to be moved within the hash table after they are initially inserted. While there are a number of known hash table constructions with this property, the worst case insertion times are too large for the applications we consider. To address this problem, we introduce and analyze a wide variety of hash table constructions that move at most one item in the during the insertion of a new item. Using differential equation approximations and numerical methods, we are able to quantify the performance of our schemes tightly and show that they are superior to standard constructions that do not allow moves.
Keywords/Search Tags:Hash, Constructions, Data, Show, Bloom
Related items