Font Size: a A A

Research On High-Dimensional Index In Large-Scale Image Databases

Posted on:2008-03-28Degree:DoctorType:Dissertation
Country:ChinaCandidate:J J LiangFull Text:PDF
GTID:1118360272966757Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
High dimensional index technique is one significant research issue for content-based image real-time retrieval in large-scale image databases. Facing the influence of the"curse of dimensionality", how to speed up retrieval in image databases with rational design of the expression,organization and extraction of the index is key problem in high-dimensional indexing schemes. The issues of the expression and organization of high-dimensional index are discussed in this paper and a series of simple and feasible index methods are proposed.Based on the research of expression and organization of current index methods, some key issues such as pruning and filtering of index tree, approximation presentation of high-dimensional vector and algorithm for fast similarity retrieval are discussed in detail, especially on their disadvantage, based on which some improvement approaches are proposed. With the development of high-dimensional index, high-dimensional main memory index arouses more and more interest, which techniques will be benefit to the research on the disk-based high-dimensional index. A new high-dimensional main memory index is designed in this paper.With large amount of points in the image databases, the retrieval performance degrades dramatically because the cost of distance computation with high dimensions increases greatly with many false hit points brought in the process of the range query. In this dissertation, a fast pruning and filtering technique is proposed based on the basis of the introduced concepts of vector ordering and active dimension. This method can solve two phases of pruning: one is in a sequence way; the other is in a point way, such that it can reduce the range that needed to search. By incorporating it into the Pyramid-Technique (PT), we can improve the retrieval efficiency of the PT.Efficient partition of data space is the basis of approximation presentation of a high-dimensional vector. Based on the techniques of approximate vector presentation and one dimensional transformation, an optimal compressive presentation of high-dimensional vector is proposed, called bit-code and distance based index. This representation enables two levels filtering. Experiments on large-scale image databases demonstrate a remarkable reduction of the amount of accessed vectors in similarity searches compared with the existing index schemes. To support efficient querying and retrieval on large-scale image databases, we propose a methodology exploring Local Bit-code Difference (LBD). On clustering the data space into a number of partitions, LBD extracts a distance and a simple bitmap representation called Bit Code (BC) for each point in the database with respect to the corresponding reference point. Pruning during KNN search is performed by dynamically selecting only a subset of the bits from the BC based on which subsequent comparisons are performed. In doing so, expensive operations involved in computing L-norm distance functions between high-dimensional data can be avoided.In main-memory databases, the number of processor cache misses has a critical impact on the performance of the system. Cache-conscious indices are designed to improve performance by reducing the number of processor cache misses that are incurred during a search operation. Considering the disadvantage of SA-Tree inefficient for main memory access, we present its variant called CSA-Tree, which is a multi-level structure where each level of the tree represents the data space at different dimensionalities by using Principal Component Analysis. Each level of the tree serves to prune the search space more efficiently as the reduced dimensions can better exploit the small cache line size. Moreover, the distance computation on lower dimensionality is less expensive. We conducted extensive experiments to evaluate the proposed structures against other methods. Our results show that the CSA-Tree is superior in most cases.
Keywords/Search Tags:Content-based image retrieval, High-dimensional index, High-dimensional main memory index, KNN search, Principal component analysis, Active dimension, Bit code
PDF Full Text Request
Related items