Font Size: a A A

Research On Hybrid Queries Of Structured And Unstructured Data Based On Proximity Graph

Posted on:2022-12-01Degree:MasterType:Thesis
Country:ChinaCandidate:M Z WangFull Text:PDF
GTID:2518306605996389Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Since the Internet era,structured data such as number,symbol,label,etc.and unstructured data such as image,video,text,etc.have exploded.The efficient and accurate hybrid query of these two types of data is a key technology to achieve highquality information retrieval,and it is also an urgent bottleneck in the industry.The current hybrid query methods are mainly to search structured and unstructured data separately(then merge and rank the results);that is,the former is mainly realized through traditional database query,and the latter mainly vectorizes unstructured data and performs approximate nearest neighbor search(ANNS).However,this separated hybrid query way limits the query efficiency and accuracy in large-scale data scenarios.To solve aforementioned problems,our research can be summarized as follows:(1)We craft a native hybrid query(NHQ)method based on proximity graph-based ANNS.Through the calculation and fusion of the similarity of structured and unstructured data,a hybrid query framework including two modules of composite index and joint pruning is designed.This framework inserts heterogeneous data into a composite index,and then jointly prunes on the composite index to efficiently obtain the hybrid query results.It is worth noting that our hybrid query framework can apply various proximity graph-based ANNS algorithms.(2)To improve the performance of the current proximity graph-based ANNS algorithms,we propose a navigable proximity graph(NPG)algorithm by optimizing the edge selection and routing strategy.Specifically,we optimize edge selection strategy by combining the distance and distribution between neighbors,as well as designing routing strategy according to the characteristics of different routing stages.The proposed strategies are applied to the index construction and search of proximity graph to form the NPG algorithm that show the state-of-the-art performance.(3)We optimize and implement two hybrid query methods based on NPG for the two modules of the NHQ framework.In the composite index,we present an edge selection strategy that integrates two types of data,in which the neighbors of the vertices maintain the uniformity of the distribution when the distance is close.For the joint pruning,we propose a two-stage routing strategy that adapts the routing characteristics of different stages.Experiments demonstrate that the query efficiency of our hybrid query method is more than one order of magnitude higher than that of the existing mainstream methods(under the same accuracy).Finally,we apply our hybrid query method in the image retrieval and the expert retrieval systems.For image retrieval,our approach improves the accuracy of the retrieval results via additional label constraints while keeping the same retrieval efficiency.Compared with existing expert retrieval methods,our solution effectively obtains accurate experts based on “technical description text + structured labels”,meanwhile,the optimized hybrid query method shows faster index construction speed and state-of-the-art efficiency vs accuracy trade-off.
Keywords/Search Tags:unstructured data, approximate nearest neighbor search, hybrid query, proximity graph
PDF Full Text Request
Related items