Font Size: a A A

Research On Key Technologies Of Deep Cross-Modal Hashing

Posted on:2021-01-10Degree:MasterType:Thesis
Country:ChinaCandidate:T WangFull Text:PDF
GTID:2428330602964564Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Nowadays,with the explosive growth of multimedia data,cross-modal retrieval has become a hot research topic in multimedia computing and information retrieval.Cross-modal retrieval takes a certain kind of modality data as query objects to retrieve the relevant data in other modalities.It breaks the limitation of traditional uni-modal retrieval which mainly focuses on the tasks of images retrieves images or texts retrieves texts,and thus opens up a new way to effectively support multi-modal data retrieval.However,large-scale cross-modal retrieval is confronted with great challenges on storage cost and retrieval speed.Cross-modal hashing aims to project the high-dimensional multi-modal data(such as text,audio,image,video,etc.)into the common low-dimensional Hamming space which preserves both inter-media and intra-media consistency in the original feature space.It effectively accelerates the speed of large-scale cross-modal retrieval and reduces the storage cost.Deep cross-modal hashing is proposed to simultaneously perform deep representation learning and hash code learning,which significantly improves the retrieval accuracy of cross-modal hashing.Although the existing deep cross modal hashing methods have achieved promising results,they still suffer from two crucial problems: 1)Most of the existing unsupervised deep cross-modal hashing methods lack the guidance of semantic labels,which limits the semantic information of the learned hash codes,and thus directly decreases the retrieval accuracy.2)Existing supervised deep cross-modal hashing methods equally handle the cross-modal retrieval tasks(image retrieves text and text retrieves image).They simply learn the same couple of hash functions in a symmetric way.Under such circumstance,the differences between cross-modal retrieval tasks are ignored and might lead to sub-optimal performance.To address the first problem,in this paper,we propose an Unsupervised Deep Cross-modal Hashing with Virtual Label Regression(UDCH-VLR)method.It presents a novel unified learning framework to jointly perform deep hash function training,virtual label learning and regression.Specifically,it learns unified hash codes via collaborative matrix factorization on the multi-modal deep representations to preserve the multi-modal shared semantics.Moreover,itincorporates the virtual label learning into the objective functions and simultaneously regresses the learned virtual labels to the hash codes,which provides strong semantic supervision for hash learning and improves the performance of cross-modal retrieval.Finally,it devises an alternative optimization strategy to directly update the deep hash functions and discrete binary codes.As a result,the discriminative capability of hash codes can be progressively enhanced with iterative learning.Extensive experiments on three publicly available cross-medal retrieval datasets validate the effectiveness of the proposed method.To address the second problem,in this paper,we propose a novel Task-Adaptive Asymmetric Deep Cross-Modal Hashing(TA-ADCMH)method,to learn task-adaptive hash functions for two sub-retrieval tasks via simultaneous modality representation and asymmetric hash learning.Unlike existing deep cross-modal hashing approaches,our learning framework jointly optimizes the semantic preservation from multi-modal features to the hash codes,and the semantic regression from the query-specific representation to the explicit labels.The learned hash codes can effectively preserve the multi-modal semantic correlations,and meanwhile,adaptively capture the query semantics.Besides,it designs an efficient discrete optimization strategy to directly learn the binary hash codes,which alleviates the relaxing quantization errors.Extensive experiments on two publicly available cross-modal retrieval datasets demonstrate the superiority of the proposed method from various aspects.
Keywords/Search Tags:Cross-modal Retrieval, Deep Discrete Hashing, Virtual Labels, Task-Adaptive, Asymmetric Deep Hashing Learning
PDF Full Text Request
Related items