Quantization Methods For Large Scale Approximate Nearest Neighbor Search On GPUs

Posted on:2021-03-20

Degree:Doctor

Type:Dissertation

Country:China

Candidate:W Chen

Full Text:PDF

GTID:1488306107456524

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

In the age of the Internet,large scale approximated nearest neighbor(ANN)search has been widely applied in many real applications,such as massive multimedia data(image,video)retrieval,face recognition,etc.Therefore,there have been a lot interest in the methods that perform ANN search efficiently and effectively.Nowadays the high-performance ANN search methods adopt quantization methods for index structure and compact codes to improve the search accuracy and speed on Graphics Processing Units(GPUs).Because the memory space of GPUs is limited and the different computational mechanism from CPU,some quantization-based ANN search methods can not operate efficiently and effectively on GPUs.So there are a lot of margin to improve the accuracy and speed of the quantization-based ANN search methods on GPU.This thesis concerns the new quantization methods combined with GPUs for better ANN search accuracy and speed.For the existing problems in quantization methods on GPUs,we proposed three novel quantization methods for ANN search on GPUs.(1)For the low utilization of index region generated by product quantization(PQ)and high quantization error for compact codes based on PQ,we proposed vector and product quantization tree(V-PQT)indexing structure and plane quantization for compact codes.VPQT index structure is a two-level indexing structure which combines vector and product quantization method together.The index structure first use a few vector quantization(VQ)centroids to divide the high dimensional space into several index regions,then PQ methods generate a large number of second-level index regions in each first-level index region.The uniform distribution of local data can improve utilization rate of index region produced by VPQT index structure.Plane quantization projects each sub-vector of high dimensional data onto the nearest hyper-plane,and connect all the project points' ID together to represent the data.Because the project points on the plane are more close to the sub-vectors the PQ sub-centroids,Plane quantization can provide much lower quantization error for compact codes.Experimental results show that V-PQT index structure reduced the number of empty regions by 80%,and Plane quantization reduced the quantization error by 98%.The ANN search based on V-PQT index structure and plane quantization outperforms the state-ofthe-art method by 34% and 68% in the term of search accuracy on two public billion-scale datesets.(2)For the excessive memory overhead of VQ index structure,we proposed vector and line quantization(VLQ)index structure.The index structure use VQ as the first level index structure and line quantization(LQ)as the second level index structure.VQ index structure splits the data space into medium number of index regions.LQ index structure then divides each first level index regions into several second level index regions with small extra memory overhead.Thus VLQ index structure only needs a medium size of first level index structure to generate a large number of index regions.Experimental results show that VLQ index structure only need a quarter of VQ's memory overhead to reach the better indexing quality.The ANN search based on VLQ index structure outperforms the search system based on VQ by 14% in search accuracy and 5� in query speed.(3)For the contradiction between search accuracy and speed,we proposed ANN search algorithm based on vector and bilayer line quantization(VBLQ)index structure.The ANN method improves the contradiction between search accuracy and speed by higher indexing quality and lower complexity of distance computation.VBLQ index structure is a three layer index structure,which use VQ index structure as the first level index structure and a bi-layer line quantization as the second and the third level index structures.The ANN search algorithm use VBLQ index structure and improved approximated distance computation algorithm to reduce the contradiction between search accuracy and speed.Experimental results show that the ANN search method based on VBLQ index structure use less than 3%accuracy loss to increase the query speed by 100%.In general,this thesis explores quantization methods on GPU for large scale high dimensional data's similarity search.In this thesis,we propose three quantization methods for ANN search on GPU.The experimental results on two public billion-scale datasets show that the proposed search methods outperform the existing ANN search methods by a large margin.

Keywords/Search Tags:

Multimedia data, Large scale approximated nearest neighbor search, Quantization, GPUs

PDF Full Text Request

Related items

1	Learning And Indexing Structural Representations Of Large Scale High Dimensional Data
2	Deep Compact Coding For Multimedia Nearest Neighbor Search
3	Quantization For Approximate Nearest Neighbor Search
4	Unsupervised Binary Code Learning for Approximate Nearest Neighbor Search in Large-scale Datasets
5	Similarity Search On Large-scale High-dimensional Data
6	Research On Key Techniques Of Content-Based Large-Scale Image Retrieval
7	Research On Nearest Neighbor Search Method Based On Multi-stage Vector Quantization
8	Research On Large Scale Multimedia Data Retrieval Based On Hashing Algorithm
9	Research On Approximate Nearest Neighbor Search Method Based On Product Quantization
10	Binary Hashing And Quantization Methods For Approximate Nearest Neighbor Search