Similarity search for large-scale image datasets

Posted on:2007-10-30

Degree:Ph.D

Type:Thesis

University:Princeton University

Candidate:Lv, Qin

Full Text:PDF

GTID:2458390005486296

Subject:Computer Science

Abstract/Summary:

Content-based image similarity search is a difficult problem due to the high dimensionality and usually massive amount of image data. The main challenge is to achieve high-quality similarity search with high speed and low space usage. This thesis proposes several techniques to address the problem of building a similarity search system for large-scale image datasets. A prototype image search system, called CASS-Image (Content-Aware Search System for Images), has been implemented to demonstrate the effectiveness of these techniques.; The first contribution of this thesis is a sketch construction algorithm that converts high-dimensional feature vectors into bit vectors (sketches), such that the weighted (and thresholded) ℓ1 distance between two feature vectors can be estimated by the Hamming distance of their sketches. Experimental results show that using sketches can typically reduce the space requirement by an order of magnitude with minimal impact on similarity search quality.; The second is a hash-perturbation based LSH (Locality Sensitive Hashing) technique for approximate nearest neighbor search in high dimensions. This technique probes multiple buckets in each hash table by perturbing the hashed value of the query object. Performance evaluations show that this method is both time and space efficient. It has a similar time efficiency as the basic LSH method while reducing the space requirement by a factor of five. Also, its time efficiency is twice that of the point-perturbation based LSH method.; The third is a multi-feature filtering algorithm for region-based image similarity search. This method uses approximation algorithms to generate a candidate set, and then ranks the objects in the candidate set with a more sophisticated multi-feature distance measure. It works for both feature vectors and their sketches. It can also be combined with indexing techniques to further speed up the search process. Performance evaluations show that filtering is 4--13 times faster than the brute-force approach, while still maintaining good search quality.; This thesis also proposes a new region-based image similarity measure, EMD* match, which uses square-root region weights and region distance thresholding. Experimental results show that EMD* match is 27%--91% more effective than previous image similarity search techniques.

Keywords/Search Tags:

Related items

1	Research On Top-k String Similarity Search Based On Edit Distance
2	A Survey On Content-Based Image Retrieval Of Intelligent Techniques
3	Study On Match Similarity Search
4	Research On Approximate String Matching Techniques Based On MC Index Structure
5	Research On Hashing Methods Towards Large-scale Similarity Computation And Search
6	Representing And Indexing Trajectories For Efficient Similarity Search With TRI Framework
7	Research On Complex Distance Measure Based MapReduce Similarity Join Techniques
8	Research On Graph Search Problem Based On Edit Distance
9	Research On The Algorithms For String Similarity Search
10	Image Similarity Matching Based On Colour And Texture