Font Size: a A A

Research On Approximate Nearest Neighbor Search Method Based On Product Quantization

Posted on:2022-10-08Degree:MasterType:Thesis
Country:ChinaCandidate:X L WangFull Text:PDF
GTID:2518306521479944Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Nowadays,the development of information technology is getting faster and faster,which makes people's life and work more convenient and fast.In the global information sharing and interaction,with the in-depth development of Internet technology,the speed and scale of information collection and transmission have reached an unprecedented level,but at the same time,the world has ushered in an era of information overload.In the face of massive document,picture,video and other data information processing,how to store and query the data information we need quickly and efficiently is a hot topic of research at home and abroad at present.In the field of data information query,similarity search,also known as approximate nearest neighbor search,has been widely used in various fields of information life.Correlation method of approximate nearest neighbor search,it has been a hot topic for scholars at home and abroad.Due to the advantages of lower memory consumption and faster retrieval speed,product quantization and its various improved and extended methods have been widely used in approximate nearest neighbor(ANN).Product quantization is a divide-and-conquer algorithm that decomposes a high-dimensional vector space into several low-dimensional subspaces and quantizes each subspace.However,product quantization still has some shortcomings.Product quantization assumes that the information energy of the subspace divided by the data vector is balanced.In fact,the information energy of each subspace of the data vector may be unbalanced.The performance of product quantization largely depends on the distribution of original data.If the amount of information in the subspace is extremely unbalanced,product quantization will produce bad results.In order to further optimize the performance of the product quantization method and its similar methods,this paper will optimize the existing based on product quantization method from two different perspectives to solve this problem.The first perspective is to remove the unbalanced information in the data space,and the mean-removed addition quantization(MRAQ)method is proposed.Firstly,the scalar quantizer is used to learn the mean value of the data points independently,and the residual vectors is obtained by removing the mean value of the data points.Then,the residual vectors of the data points is operated by addition quantization.The method by removing the operation of quantization mean,the unbalanced information in the data space is removed,and the remaining part is balanced in the sense that the statistical mean is zero and the expected value of each dimension is zero..Experimental results show that the MRAQ method is substantially improved in terms of accuracy,and the performance of MRAQ method is also optimized compared with the classical method.The second method is to allocate different bits to the subspace according to the energy of information in the subspace,which makes use of the unbalanced information energy of the subspace.Product quantization allocates the same number of bits to the subspace,but the amount of information in the subspace may be unbalanced,even distribution of bits may lead to larger quantization distortion.The method of optimized bit allocation product quantization is proposed.The method firstly allocates different bits to the subspace according to the weight of the amount of information in each subspace of the data set in the total amount of information,and then takes the realization of quantization distortion as the objective function to adjust the subspace bit allocation and get the final bit allocation.We also use experiments to verify the effectiveness of the method and achieve further performance optimization.
Keywords/Search Tags:product quantization, approximate nearest neighbor search, vector quantization, mean-removed, bit allocation
PDF Full Text Request
Related items