Font Size: a A A

Research And Application Of Key Technologies For Complex Vector Search Based On Nearest Neighbor Graph

Posted on:2024-09-07Degree:MasterType:Thesis
Country:ChinaCandidate:L W LvFull Text:PDF
GTID:2530307103475634Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the flourishing of the digital economy,various unstructured data are growing explosively.With the rapid development of deep learning,these data can be transformed into vectors and their value can be utilized,making high-performance and low-cost vector search a research hotspot in both academia and industry.Currently,vector search mainly includes tree-based,hash-based,and quantizationbased schemes.In recent years,vector search based on nearest neighbor graphs has significant performance advantages and potential applications,thus becoming a research focus.However,the current nearest neighbor graph-based vector search algorithms mainly focus on single vector search.Faced with application scenarios such as multi-vector joint search and attribute-constrained vector search and its largescale expansion,there are significant challenges in terms of performance and cost.To address these issues,the main research content of this thesis is as follows:(1)We propose an efficient multi-vector query framework based on nearest neighbor graphs(EMQ).Currently,multi-vector joint search is mainly based on a separate scheme,which involves independent single-vector search and subsequent result merging.However,this approach suffers from low search efficiency and limited accuracy.EMQ addresses these issues by independently calculating the similarity of various vectors and performing weighted fusion.It then embeds the multi-vectors into a composite index,based on the fusion results,and performs efficient search by routing based on the composite index.(2)We propose an efficient attribute-constrained vector search scheme based on EMQ.Currently,there are issues with low efficiency and insufficient accuracy in both the approach of vector search followed by attribute filtering or attribute filtering followed by vector search.To address these issues,we propose a fusion calculation of vector similarity and attribute-constrained similarity,as well as a composite index construction method based on EMQ.This achieves efficient attribute-constrained vector search.This method has good reference value for efficient mixed search of fused structured and unstructured data.(3)We propose a large-scale complex vector search framework based on Disk ANN and EMQ(Disk-EMQ).Disk ANN combines SSD nearest neighbor graph indexing and memory PQ indexing to achieve a highly efficient and low-cost largescale vector search scheme that has gained widespread industry attention.Based on Disk ANN and research results(1)and(2),this thesis proposes a large-scale vector search scheme that combines SSD to achieve multi-vector joint search and attributeconstrained vector search in large-scale scenarios.Building on the aforementioned research findings,this thesis has devised and executed a vector search system,which has effectively accomplished three applications: multi-vector joint image search,attribute-constrained image search,and expert recommendation,all based on the system.The system and its applications have validated the feasibility and effectiveness of the research conducted in this thesis.
Keywords/Search Tags:vector search, nearest neighbor graph, multi-vector joint search, attribute-constrained vector search
PDF Full Text Request
Related items