Research On Hashing Learning Algorithm And Evaluation Metric For Bucket Lookup Optimization

Posted on:2022-03-20

Degree:Master

Type:Thesis

Country:China

Candidate:M W Li

Full Text:PDF

GTID:2518306725493074

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Nearest neighbor search refers to finding the data points most similar to the given query point from the database.Nearest neighbor search is one of the basic problems in various fields,such as computer vision,recommendation systems,and machine learning.However,when the size of the database is large and the dimension of the data point is high,the time cost of accurately finding the data points closest to the given query point will be huge.Approximate nearest neighbor search plays a very important role in solving this problem.Due to its fast retrieval speed and low storage cost,hash learning has become one of the most widely used approximate nearest neighbor search methods.The purpose of hash learning is to map data points from the original high-dimensional feature representation to the low-dimensional binary hash codes that can preserve similarity by learning a hash function.Based on the binary hash codes,hash learning can achieve sub-linear query speed under ideal situations by utilizing hash bucket lookup.However,in real applications,the complexity of bucket lookup will increase at a geometric speed as the search Hamming radius and the length of binary hash codes increase.Hence,the optimization of bucket lookup is significant in real applications.This paper studies the optimization of bucket lookup from the following two aspects:hash learning algorithm and evaluation metric.The main contributions of this thesis are listed as follows:? Most of the existing hash learning methods are single code methods,these hash learning methods only learn one binary hash code for each data point.In the case of the single code,the hash bucket lookup might not achieve satisfactory retrieval efficiency.Existing single code hashing methods might fail to put similar image pairs to hash buckets with a small Hamming distance,which will lead to visit more hash buckets and reduce the retrieval efficiency.To solve this problem,this paper starts from the perspective of multiple codes,and proposes a novel hashing framework,called multiple code hashing(MCH),to learn multiple binary hash codes for each data point to improve retrieval efficiency.MCH is also a flexible framework that decomposes the learning process of multiple binary hash codes into two steps:the base hashing model learning step and the agent learning step.This decomposition simplifies the learning process and makes it easy for MCH to integrate different base hashing models.This paper verifies the effectiveness of MCH in large-scale image retrieval tasks with complex semantic information.Experiments demonstrate that MCH can significantly improve the retrieval efficiency of hash bucket lookup compared with existing hashing methods.? Most of the existing hash learning methods are single index methods,these hash learning methods use the complete binary hash code to construct only one single index.In the case of the single index,the hash bucket lookup might not achieve satisfactory retrieval efficiency.In the face of difficult retrieval tasks,existing single index hashing learning methods need to learn long binary hash codes to achieve suitable retrieval accuracy.This will lead to visit a large number of hash buckets and reduce the retrieval efficiency when retrieving similar data points.To solve this problem,this paper starts from the perspective of multi-index,and proposes a novel hashing method,called deep multi-index hashing(DMIH).DMIH integrates multi-index hash and deep multi-branch feature learning network into the same framework for the first time.Hence,the feature learning process and the hash code learning process can facilitate each other.Furthermore,a novel block-wise multiindex hashing table construction approach and a search-aware multi-index(SAMI)loss are proposed to further improve the retrieval efficiency.This paper verifies the effectiveness of DMIH in the task of person re-identification.Experiments show that,compared with existing baselines,DMIH can significantly improve the performance of hash bucket lookup in terms of both efficiency and accuracy.? In terms of the performance evaluation for hash bucket lookup,this paper points out for the first time the problems encountered when existing widely used evaluation metrics are used to evaluate hash bucket lookup.In order to solve these problems,this paper proposes a novel evaluation metric,called radius aware mean average precision(RAMAP).RAMAP considers the impact of the number of hash buckets that need to be visited while calculating accuracy,and provides a hash bucket lookup evaluation method that can consider both accuracy and efficiency.This paper verifies the effectiveness of RAMAP in large-scale image retrieval tasks.The experimental results show that,compared with the existing evaluation metrics,the RAMAP proposed in this paper can give a more reasonable and comprehensive performance evaluation for hash bucket lookup.

Keywords/Search Tags:

Hahing Learning, Hash Bucket Lookup, Retrieval Efficiency

PDF Full Text Request

Related items

1	An Efficient Parallel Name Lookup Scheme In Named Data Networking
2	A Hash-based Routing Lookup Algorithm
3	Research On Hash Learning And Its Application
4	Research On Deep Hash Learning For Efficient Image Retrieval
5	Research On Future Oriented High-performance Packet Lookup And Classification Technology
6	Design And Implementation Of Packet Lookup Technology Based On DPDK
7	Research On Fast Retrieval Method Of Image Data Based On Learning To Hash
8	Deep Learning To Hash For Large-Scale Multimedia Retrieval
9	Research Of Hash Image Retrieval Algorithm Based On Improved LLE
10	Research On Image Retrieval With Hash Codes Based On Deep Learning