Font Size: a A A

Deep Label-based Hashing For Cross-modal Retrieval

Posted on:2020-03-03Degree:MasterType:Thesis
Country:ChinaCandidate:P F ZhaoFull Text:PDF
GTID:2428330572488704Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the explosive growth of multimedia data,in many scenarios,it is difficult to perform the nearest neighbor search due to the large computational cost.Therefore,recently,approximate nearest neighbor(ANN)search has attracted more and more attention.Especially,many hashing based ANN search methods have been proposed,which first map samples into a Hamming space while the similarity in the original space is preserved,thereafter perform the search with the XOR operation.Consequently,the search becomes extremely efficient and the storage space can also be greatly reduced.More recently,many cross-modal hashing methods have been proposed,which is able to perform the search task between different modalities,e.g.,using images to search texts.According to the feature extraction strategy,existing cross-modal hashing methods can be divided into two categories:shallow and deep ones.In shallow methods,the feature extraction is independent of the learning of binary codes or hash functions;moreover,the representation ability of the hand-crafted features is limited;therefore,it is hard for them to achieve very satisfactory performance.In contrast,the latter leverage the powerful feature learning ability of deep neural networks and usually learn the features and hash functions simultaneously;consequently,compared with the former,they are able to achieve better performance.However,there are still some issues that need to be further considered.Generally,they employ images,texts and the pairwise similarity to learn the hash functions and the unified binary codes.However,texts always contain a lot of noise,and the contents of a pair of text and image are not consistent.Therefore,the generated binary codes may also contain much noise.In addition,these methods leverage the semantic information by simply constructing a similarity matrix,which may also lose some important information.To address these issues,in this paper,we present a novel deep cross-modal hashing method,named Fast Label-Preferred Hashing,FLPH for short.Specifically,it first learns the hash functions and the unified binary codes for images and the corresponding labels;thereafter,it learns the hash functions for the texts.Because the contents of the images and labels are much consistent,it is able to generate high-quality binary codes.Moreover,based on a proposed iterative optimization algorithm,the binary codes can be generated without relaxation,which may further reduce the quantization error.Extensive experiments on two benchmark datasets demonstrate that FLPH outperforms some state-of-the-art cross-modal hashing methods.Especially,the training of FLPH is much faster than other deep hashing models.On the other hand,supervised deep cross-modal hashing methods have achieved much promising performance.They usually employ all the modalities and the labels to learn hash functions and binary codes.However,the original modalities,e.g.,images and texts,always contain a lot of noise.Therefore,the generated binary codes may also contain much noise.In addition,these methods leverage the semantic information by simply constructing a similarity matrix,which may also lose some useful information.To address these issues,in this paper,we present a novel deep cross-modal hashing method,i.e.,Label-Preferred Deep Hashing,LPDH for short.Specifically,it first learns the unified binary codes from the labels;thereafter,it generates hash functions for all modalities which can be trained in parallel.As labels almost contain no noise,it is able to generate high-quality binary codes.The labels are also fully exploited to generate the binary codes.Moreover,LPDH can learn the hash functions in parallel,based on the simple loss function and the high-quality binary codes.Extensive experiments on two benchmark datasets demonstrate that LPDH outperforms some state-of-the-art cross-modal hashing methods.Especially,verifiability experiments demonstrate that fully exploiting the labels to capture the similarity between samples can suppress the influence of noise.
Keywords/Search Tags:Learning to hash, Cross-modal Retrieval, Deep Hashing, Approximate nearest neighbor search
PDF Full Text Request
Related items