Font Size: a A A

Distance-based indexing: Observations, applications, and improvements

Posted on:2007-07-24Degree:Ph.DType:Dissertation
University:Case Western Reserve UniversityCandidate:Tasan, MuratFull Text:PDF
GTID:1448390005961777Subject:Computer Science
Abstract/Summary:
Multidimensional indexing has long been an active research problem in computer science. Most solutions involve the mapping of complex data types to high-dimensional vectors of fixed length and applying either Spatial Access Methods (SAMs) or Point Access Methods (PAMs) to the vectorized data.; In more recent times, however, this approach has found its limitations. Much of the current data is either difficult to map to a fixed-length vector (such as arbitrary length strings), or maps only successfully to a very high number of dimensions. In both cases, Distance-Based Indexing serves as an attractive alternative, relying only on the pairwise distance information of data items to build indices that offer efficient similarity search retrieval.; In this work, distance-based indexing is approached first in a general fashion, where a framework is laid out that encompasses both distance-based indexing methods as well as SAMs and PAMs. Shared properties of various seemingly unrelated data structures can be exploited, as is shown by the presentation of a single (and optimal) search algorithm that works on a variety of trees for a variety of different search types.; The motivation for distance-based indexing is then shown via an application of indexing strings (biological sequences, to be exact). By simply showing that a distance function satisfies the properties of a metric, it is illustrated that many forms of data, with various distribution characteristics can successfully be indexed with distance-based indexing.; Finally, a probabilistic approach towards indexing leads to an improved tree construction algorithm, as well as an information based search algorithm that searches the information stored in any data structure, regardless of the form (i.e., whether the structure is a tree or a matrix, the algorithm performs equally well).
Keywords/Search Tags:Indexing, Data, Search, Algorithm
Related items