Font Size: a A A

Efficient similarity search based on data distribution properties in high dimensions

Posted on:2002-01-31Degree:Ph.DType:Dissertation
University:Michigan State UniversityCandidate:Li, JinhuaFull Text:PDF
GTID:1468390011492035Subject:Computer Science
Abstract/Summary:
The characteristics of data distribution in high dimensional Euclidean space play a fundamental role in many areas of computer science, including database systems, pattern recognition, and multimedia. We have studied and formalized a few data distribution properties in high dimensional Euclidean space, such as the distance properties and the angle property. We have exploited these basic properties in developing efficient applications, such as the nearest neighbor queries in high dimensions.; Nearest Neighbor search is a fundamental task in many applications. At present state of the art approaches to nearest neighbor search are not efficient in high dimensions. In this dissertation we present an efficient Angle based Balanced index structure called the AB-tree, which uses heuristics to decide whether or not to access a node in the index tree based on the estimated angle and the weight of the node. Extensive experiments on synthetic data as well as real data demonstrate that the AB-tree outperforms other index structures such as the SS-tree by orders of magnitude while maintaining 90% true nearest neighbors.
Keywords/Search Tags:Data distribution, Nearest neighbor, Efficient, Search
Related items