Font Size: a A A

Fusion coding, search and retrieval of correlated data

Posted on:2010-12-16Degree:Ph.DType:Dissertation
University:University of California, Santa BarbaraCandidate:Ramaswamy, SharadhFull Text:PDF
GTID:1448390002476986Subject:Engineering
Abstract/Summary:
This work is a collection of results on novel uses of quantizers for two problems in database management, namely, similarity search in high-dimensional databases, and fusion-compression for selective retrieval of correlated data sources (data-streams). The results can be grouped into three main categories, each of which occupies a paragraph below.;Similarity search for nearest neighbors in databases where entries are represented by high-dimensional, correlated features, is challenging and compression is a reliable means to speed up search time. It is shown how to efficiently use the vector quantization scheme for exact similarity search. A new cluster distance bounding technique, based on separating hyperplanes and cluster supports, is presented and shown to achieve exact similarity search while requiring the retrieval of only a few relevant clusters. Next, a property of point-to-hyperplane distance is derived that enables extending the query-cluster distance bound to an adaptively changing Mahalanobis distance matrix. The resulting cluster indexing efficiently adapts to relevance feedback from the user. Finally, these bounds are further tightened by recasting the problem of cluster-distance bounding as norm minimization with linear constraints. The resulting convex optimization problem complexity is polynomial in the number of dimensions.;The second focus is on fusion compression for selective retrieval of data from correlated source (sensor) networks. A fusion coding paradigm is presented where, given a fixed storage rate (space), the retrieval rate (time) is traded against distortion (quality). Only statistics of future queries are assumed to be apriori known. The optimality properties of the fusion coder are derived, and an iterative algorithm for design is presented. Lastly, these properties are exploited to reduce growth of design complexity with query-set size and designs are adapted to queries unknown during design phase.;The proposed fusion coder's design complexity scales exponentially with storage rate and number of blocks. Two techniques are presented to handle the growth in design complexity. In the first, by imposing "clever" constraints on fusion coder modules, a "Shared Descriptions" framework is obtained wherein, it is possible to trade storage rate, retrieval rate, distortion and complexity of design against each other. This allows design at high storage rates and provides gains over naive quantization schemes that have not been optimized for fusion coding. On the other hand, time correlations are exploited by a predictive coder, which is a low-complexity alternative to fusion coding over long blocks. The algorithm for predictive fusion coder design is based on the asymptotic closed loop principle and thereby, stable. The resulting predictive coder is matched to the closed-loop error statistics and provides substantial gains over memoryless fusion coding and joint compression schemes.
Keywords/Search Tags:Fusion coding, Search, Retrieval, Correlated, Coder
Related items