Research On High-Dimensional Index In Large-Scale Image Databases

Posted on:2008-03-28

Degree:Doctor

Type:Dissertation

Country:China

Candidate:J J Liang

Full Text:PDF

GTID:1118360272966757

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

High dimensional index technique is one significant research issue for content-based image real-time retrieval in large-scale image databases. Facing the influence of the"curse of dimensionality", how to speed up retrieval in image databases with rational design of the expression,organization and extraction of the index is key problem in high-dimensional indexing schemes. The issues of the expression and organization of high-dimensional index are discussed in this paper and a series of simple and feasible index methods are proposed.Based on the research of expression and organization of current index methods, some key issues such as pruning and filtering of index tree, approximation presentation of high-dimensional vector and algorithm for fast similarity retrieval are discussed in detail, especially on their disadvantage, based on which some improvement approaches are proposed. With the development of high-dimensional index, high-dimensional main memory index arouses more and more interest, which techniques will be benefit to the research on the disk-based high-dimensional index. A new high-dimensional main memory index is designed in this paper.With large amount of points in the image databases, the retrieval performance degrades dramatically because the cost of distance computation with high dimensions increases greatly with many false hit points brought in the process of the range query. In this dissertation, a fast pruning and filtering technique is proposed based on the basis of the introduced concepts of vector ordering and active dimension. This method can solve two phases of pruning: one is in a sequence way; the other is in a point way, such that it can reduce the range that needed to search. By incorporating it into the Pyramid-Technique (PT), we can improve the retrieval efficiency of the PT.Efficient partition of data space is the basis of approximation presentation of a high-dimensional vector. Based on the techniques of approximate vector presentation and one dimensional transformation, an optimal compressive presentation of high-dimensional vector is proposed, called bit-code and distance based index. This representation enables two levels filtering. Experiments on large-scale image databases demonstrate a remarkable reduction of the amount of accessed vectors in similarity searches compared with the existing index schemes. To support efficient querying and retrieval on large-scale image databases, we propose a methodology exploring Local Bit-code Difference (LBD). On clustering the data space into a number of partitions, LBD extracts a distance and a simple bitmap representation called Bit Code (BC) for each point in the database with respect to the corresponding reference point. Pruning during KNN search is performed by dynamically selecting only a subset of the bits from the BC based on which subsequent comparisons are performed. In doing so, expensive operations involved in computing L-norm distance functions between high-dimensional data can be avoided.In main-memory databases, the number of processor cache misses has a critical impact on the performance of the system. Cache-conscious indices are designed to improve performance by reducing the number of processor cache misses that are incurred during a search operation. Considering the disadvantage of SA-Tree inefficient for main memory access, we present its variant called CSA-Tree, which is a multi-level structure where each level of the tree represents the data space at different dimensionalities by using Principal Component Analysis. Each level of the tree serves to prune the search space more efficiently as the reduced dimensions can better exploit the small cache line size. Moreover, the distance computation on lower dimensionality is less expensive. We conducted extensive experiments to evaluate the proposed structures against other methods. Our results show that the CSA-Tree is superior in most cases.

Keywords/Search Tags:

Content-based image retrieval, High-dimensional index, High-dimensional main memory index, KNN search, Principal component analysis, Active dimension, Bit code

PDF Full Text Request

Related items

1	Research On Vector Approximation Method In High-dimensional Index Technology
2	Research Of High-dimensional Index Structure Based On One-dimensional Mapping And Its Application
3	The Research Of Content-based Image Indexing And Browsing Algorithm
4	Research On High-dimensional Index In Large-scale Image Retrieval
5	Research On Algorthms Of High-dimensional Multimedia Data Indexing
6	Research On High-dimensional Indexing In Content-Based Image Retrieval
7	Medical Image Retrieval Technology Based On Multiple-inverted File
8	Research On Indexing And System Implementation In Content-Based Image Retrieval
9	Research On Dimension Reduction Methods Of High Dimensional Data
10	Research On Index Techniques For Content-based Medical Image Retrieval