Font Size: a A A

Quantization For Approximate Nearest Neighbor Search

Posted on:2018-04-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:T ZhangFull Text:PDF
GTID:1318330512482677Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Nearest neighbor search has been a fundamental research topic in machine learn-ing,computer vision and information retrieval.Unfortunately,exact nearest neighbor search is often impractical in the large scale high-dimensional case because the comput-ing cost is very high,causing unaffordable latency to return the search result.Therefore,there have been a lot of interests in algorithms that perform approximate nearest neigh-bor(ANN)search.The approximate nearest neighbor search algorithms are in general developed from two main aspects:(1)comparing the query with only a small subset of database vectors through special data structures;and(2)accelerating the distance computation through compact codes such as hashing and quantization.This thesis concerns the quantization algorithms for ANN search with high-dimensional data,which have shown competitive search accuracy with tractable storage and search cost in ANN search.Novel algorithms in this thesis are as follows.1.We present a novel compact coding approach,composite quantization,for ANN search.The idea is to use the composition of several elements selected from the dictionaries to accurately approximate a vector and to represent the vector by a short code composed of the indices of the selected elements.To efficiently com-pute the approximate distance of a query to a database vector using the short code,we introduce an extra constraint,constant inter-dictionary-element-product,re-sulting in that approximating the distance only using the distance of the query to each selected element is enough for nearest neighbor search.Experimental com-parison with state-of-the-art algorithms demonstrates the efficacy of the proposed approach.2.We develop a novel approach,called sparse composite quantization,which con-structs sparse dictionaries,to address the problem that the runtime cost of com-puting the distance table in composite quantization becomes non-negligible in real applications,e.g.,reordering the candidates retrieved from the inverted index when handling very large scale databases.The benefit of the proposed method is that the distance evaluation between the query and the dictionary element(a sparse vector)is accelerated using the efficient sparse vector operation,and thus the cost of distance table computation is reduced a lot.Experiment results on large scale ANN retrieval tasks and applications to object retrieval show that the proposed approach yields competitive performance.3.We propose a cross-modal quantization approach,which is among the early at-tempts to introduce quantization into cross-modal search.Cross-modal similarity search is a problem about designing a search system supporting querying across content modalities,e.g.,using an image to search for texts or using a text to search for images.The major contribution lies in jointly learning the quantizers for both modalities through aligning the quantized representations for each pair of image and text.In addition,our approach simultaneously learns the common space for both modalities in which quantization is conducted to enable efficient and ef-fective search using the Euclidean distance computed in the common space with fast distance table lookup.Experimental results demonstrate that the proposed approach achieves the state-of-the-art performance.4.We present a compact coding approach,supervised quantization to address the problem of searching for semantically similar images from a large database.Our approach simultaneously learns feature selection that linearly transforms the database points into a low-dimensional discriminative subspace,and quantizes the data points in the transformed space.The optimization criterion is that the quantized points not only approximate the transformed points accurately,but also are se-mantically separable:the points belonging to a class lie in a cluster that is not overlapped with other clusters corresponding to other classes,which is formu-lated as a classification problem.The experiments on several standard datasets show the superiority of our approach.In a summary,we propose four novel algorithms based on quantization for ANN search from the perspective of unsupervised nearest neighbor search,cross-modal near-est neighbor search and supervised nearest neighbor search.The proposed approaches show superior performance over the existing state-of-the-arts.
Keywords/Search Tags:Approximate nearest neighbor search, quantization, composite quantiza-tion, near-orthogonal composite quantization, sparse composite quantization, collabo-rative quantization, cross-modal search, collaborative quantization, supervised quanti-zation
PDF Full Text Request
Related items