Font Size: a A A

Query Rate Aware Incremental Index Updat

Posted on:2018-01-30Degree:Ph.DType:Dissertation
University:New York University Tandon School of EngineeringCandidate:Nepomnyachiy, SergeyFull Text:PDF
GTID:1478390020457565Subject:Computer Science
Abstract/Summary:
Inverted index files are commonly used to support keyword search in document collections. While the offline construction of an index can be done efficiently, its incremental update remains a hard problem, especially when the index does not completely fit into memory and the system has to process a query stream at the same time. We propose a novel approach for maintaining up-to-date index files on a system that constantly serves document updates and user queries. Unlike previous updating policies, we apply the knowledge of both the update term distribution and the query term distribution to partition the terms into functional groups. We implement two schemes for selective enforcement of contiguous layout of the data on disk, while mandating that the cost of the consolidation is less than its estimated benefit. The first is the "greedy merge" inspired by the ski-rental problem, as studied in the context of competitive analysis. The second is the "opportunistic prognosticator" -- by making reliable predictions, the online problem becomes suitable for offline optimizations. We compare the performance of our system with baseline algorithms for incremental indexing in three different settings: an "append-only" system that grows indefinitely, and two fixed size systems where documents are deleted to accommodate new data, in either chronological order or in arbitrary order.
Keywords/Search Tags:Index, Query, Incremental, System
Related items