Font Size: a A A

Research On K-nearest Neighbor Query Technology On High-dimensional Data

Posted on:2022-12-30Degree:MasterType:Thesis
Country:ChinaCandidate:J XuFull Text:PDF
GTID:2518306611495684Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
As a basic problem in the field of information technology,k-nearest neighbor query is widely used in all walks of life.For example,it is used as an approximate query in information retrieval,as a classification in machine learning,and has countless applications in databases,computer vision and other fields.With the penetration of computers into various industries and fields,data storage is cheaper and more convenient.These characteristics have resulted in the characteristics of large scale,rich types and fast generation of current data.These data characteristics bring great challenges to data management,analysis and utilization.The k-nearest neighbor query technique studied in this paper is greatly affected by the data scale and data dimension,which has always been a hot issue in the field of big data.At present,the research on k-nearest neighbor query technology can be summarized as: tree-based,graph-based and hash-based technology.They all have their own advantages due to their different structures,but they cannot meet the requirements of short query time,small index space,and accurate query results at the same time.The reason is that the index structure currently used to support k-nearest neighbor query is static,and the performance of the index structure cannot be improved based on historical query experience,and the impact of query workload changes on the performance of k-nearest neighbor query results is ignored.To sum up,this paper proposes a k-nearest neighbor query technology with short query time,small index space and accurate query results based on historical query experience and changes in query workload.The main contributions of this paper are as follows:1)Aiming at the problem that k-nearest neighbor query technology cannot improve the performance of index structure based on historical query experience,this paper designs an index structure with fast query speed and small index structure called HCTree,and sets up an index optimization algorithm based on historical query.The HCTree optimized based on historical query has a significant improvement in accuracy,and the optimized index structure is called d-HCTree.2)Aiming at the problem that the index structure cannot adapt to the change of query workload,this paper proposes two data distribution optimization algorithms.The data distribution optimization algorithm based on historical query can effectively reduce the query time of d-HCTree and make d-HCTree adapt to changes in query workload.The incremental data distribution optimization algorithm can effectively improve the query accuracy of HCTree and adapt to changes in query workload.3)Aiming at the problem that the index structure adapts to the slow change of query workload,this paper proposes a data distribution optimization algorithm based on reinforcement learning.The reinforcement learning model not only effectively improves the accuracy of HCTree,but also effectively shortens the time spent on data distribution optimization,flexibly selects the scale of the data distribution optimization problem,and adjusts the data distribution in real time.
Keywords/Search Tags:approximate k-nearest neighbor, data distribution, reinforcement learning
PDF Full Text Request
Related items