Font Size: a A A

Research On Hashing Learning Algorithm And Evaluation Metric For Bucket Lookup Optimization

Posted on:2022-03-20Degree:MasterType:Thesis
Country:ChinaCandidate:M W LiFull Text:PDF
GTID:2518306725493074Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Nearest neighbor search refers to finding the data points most similar to the given query point from the database.Nearest neighbor search is one of the basic problems in various fields,such as computer vision,recommendation systems,and machine learning.However,when the size of the database is large and the dimension of the data point is high,the time cost of accurately finding the data points closest to the given query point will be huge.Approximate nearest neighbor search plays a very important role in solving this problem.Due to its fast retrieval speed and low storage cost,hash learning has become one of the most widely used approximate nearest neighbor search methods.The purpose of hash learning is to map data points from the original high-dimensional feature representation to the low-dimensional binary hash codes that can preserve similarity by learning a hash function.Based on the binary hash codes,hash learning can achieve sub-linear query speed under ideal situations by utilizing hash bucket lookup.However,in real applications,the complexity of bucket lookup will increase at a geometric speed as the search Hamming radius and the length of binary hash codes increase.Hence,the optimization of bucket lookup is significant in real applications.This paper studies the optimization of bucket lookup from the following two aspects:hash learning algorithm and evaluation metric.The main contributions of this thesis are listed as follows:? Most of the existing hash learning methods are single code methods,these hash learning methods only learn one binary hash code for each data point.In the case of the single code,the hash bucket lookup might not achieve satisfactory retrieval efficiency.Existing single code hashing methods might fail to put similar image pairs to hash buckets with a small Hamming distance,which will lead to visit more hash buckets and reduce the retrieval efficiency.To solve this problem,this paper starts from the perspective of multiple codes,and proposes a novel hashing framework,called multiple code hashing(MCH),to learn multiple binary hash codes for each data point to improve retrieval efficiency.MCH is also a flexible framework that decomposes the learning process of multiple binary hash codes into two steps:the base hashing model learning step and the agent learning step.This decomposition simplifies the learning process and makes it easy for MCH to integrate different base hashing models.This paper verifies the effectiveness of MCH in large-scale image retrieval tasks with complex semantic information.Experiments demonstrate that MCH can significantly improve the retrieval efficiency of hash bucket lookup compared with existing hashing methods.? Most of the existing hash learning methods are single index methods,these hash learning methods use the complete binary hash code to construct only one single index.In the case of the single index,the hash bucket lookup might not achieve satisfactory retrieval efficiency.In the face of difficult retrieval tasks,existing single index hashing learning methods need to learn long binary hash codes to achieve suitable retrieval accuracy.This will lead to visit a large number of hash buckets and reduce the retrieval efficiency when retrieving similar data points.To solve this problem,this paper starts from the perspective of multi-index,and proposes a novel hashing method,called deep multi-index hashing(DMIH).DMIH integrates multi-index hash and deep multi-branch feature learning network into the same framework for the first time.Hence,the feature learning process and the hash code learning process can facilitate each other.Furthermore,a novel block-wise multiindex hashing table construction approach and a search-aware multi-index(SAMI)loss are proposed to further improve the retrieval efficiency.This paper verifies the effectiveness of DMIH in the task of person re-identification.Experiments show that,compared with existing baselines,DMIH can significantly improve the performance of hash bucket lookup in terms of both efficiency and accuracy.? In terms of the performance evaluation for hash bucket lookup,this paper points out for the first time the problems encountered when existing widely used evaluation metrics are used to evaluate hash bucket lookup.In order to solve these problems,this paper proposes a novel evaluation metric,called radius aware mean average precision(RAMAP).RAMAP considers the impact of the number of hash buckets that need to be visited while calculating accuracy,and provides a hash bucket lookup evaluation method that can consider both accuracy and efficiency.This paper verifies the effectiveness of RAMAP in large-scale image retrieval tasks.The experimental results show that,compared with the existing evaluation metrics,the RAMAP proposed in this paper can give a more reasonable and comprehensive performance evaluation for hash bucket lookup.
Keywords/Search Tags:Hahing Learning, Hash Bucket Lookup, Retrieval Efficiency
PDF Full Text Request
Related items