Font Size: a A A

Triplet-Based Deep Hashing Network For Cross-Modal Retrieval

Posted on:2019-11-05Degree:MasterType:Thesis
Country:ChinaCandidate:Z J ChenFull Text:PDF
GTID:2428330572952225Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
Over the past decade,with the rapid development of Internet technology and social network,millions of multimedia data have been generated every day.Multimedia data on the Internet exists in different forms from heterogeneous data sources.For example,a web page may contain multiple modal data such as texts,pictures,videos,etc.Although these data come from different modalities,they have strong semantic correlation.Cross-modal retrieval is designed for scenarios where the queries and retrieval results are from different modalities.Cross-modal retrieval mainly faces two technical problems.One is how to extract the sample features of different modalities to contain richer semantic features,and the second is how to bridge the semantic gap between different modalities.In order to solve the above problems,many cross-modal retrieval methods have been proposed.Among them,the hashing methods have attracted extensive attention from industry and academia due to their efficient retrieval speed and low memory cost.The cross-modal hashing methods map the high-dimensional original data into compact hash codes,and then compute the Hamming distance among cross-modal data via fast bit-wise XOR operation to measure the similarity between the cross-modal data.For two problems in cross-modal retrieval,we propose two cross-modal hashing retrieval methods,the specific content is as follows:(1)A cross-modal retrieval method based on a triplet deep hashing network is proposed.In order to extract effective cross-modal sample features,we integrate the feature learning and the hash code learning into a unified end-to-end deep neural network.At the same time,the proposed method uses triplet label as supervised information,and the triplet label can more flexibly capture multiple high-level similarities and generate different constraints.Furthermore,triplet organization can enlarge the number of training data to alleviate the over-fitting problem.This method effectively improves the retrieval accuracy of cross-modal retrieval.(2)A cross-modal retrieval method based on graph regularized triplet deep hashing network is proposed.Based on the above method,we use the triplet label to establish different triplet loss functions,inter-modal triplet loss function,intra-modal triplet loss function and graph regularization loss function.The inter-modal triplet loss function is used to construct the semantic relationship between different modalities.The intra-modal triplet loss function is used to enhance the discriminability of the hash code.The graph regularization loss function is used to establish the semantic similarity between the original space and the Hamming space.This method alleviates the semantic gap between the cross-modal data and effectively improves the retrieval accuracy.
Keywords/Search Tags:Deep neural network, Hashing, Triplet labels, Cross-modal retrieval, Graph regularization
PDF Full Text Request
Related items