Font Size: a A A

Graph Convolutional Network Hashing For Cross-Modal Retrieval

Posted on:2021-05-25Degree:MasterType:Thesis
Country:ChinaCandidate:R Q XuFull Text:PDF
GTID:2518306050966419Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
With the strong rise of social networks in the past few years,the amount of multimedia data generated on the Internet has exceeded our imagination.In the face of massive amount of multimedia data,people need powerful cross-modal retrieval algorithms for the demand of similarity data retrieval.Currently,the biggest challenge of cross-modal retrieval comes from how to better overcome the heterogeneity differences between different modalities,which will make the modality gap more difficult to bridge.To solve this problem,researchers have proposed many cross-modal retrieval algorithms,among which the cross-modal hash-ing methods have received widespread attention because of their high retrieval efficiency and low storage cost:on the basis of retaining the semantic similarity between data,the cross-modal hashing methods use different hash functions to map different modal data into compact binary hash codes in Hamming space,and use XOR operation to calculate the sim-ilarity between different hash codes,then measure the similarity between data.To mitigate the differences between modalities and bridge the modality gap,many cross-modal hashing algorithms focus on the use of semantic similarity between cross-modal data to extract more discriminative data features and improve the retrieval accuracy.However,we believe that there are not only semantic similarities in cross-modal data,but also the sim-ilarities of data spatial structures,fully mining these two types of similarity information will greatly improve the accuracy of cross-modal hashing.Based on the above analysis,in this paper we propose two different cross-modal retrieval algorithms based on graph convolu-tional neural networks,which use graph convolutional neural networks to learn the spatial structure similarity of data and combine it with the semantic similarity,leading to a better retrieval accuracy.The contents of this paper are listed below:(1)We propose a self-attention graph convolutional neural network based cross-modal re-trieval method.We designed an end-to-end deep network structure that combines feature extraction and hash code learning process.The feature fusion method based on the self-attention mechanism is used to fuse the features of different modalities into one.In addition,in order to better capture the structural similarity of the data,we use graph convolutional neural networks to embed the spatial information of the data into the fused features,and use the fused feature to guide the learning process of the feature extraction network,which leads to a better accuracy in terms of retrieval.(2)We propose a multi-graph fusion graph convolutional neural network based cross-modal retrieval method.We use the rich information contained in labels to build a semantic-spatial similarity graph structure.At the same time,graph convolutional neural networks are used to separately establish data graph structures for different modal data,and the supervised in-formation is introduced into the process of feature extraction by multi-graph fusion,which effectively models the semantic and structural similarities,learns more discriminative fea-tures and improves the accuracy of cross-modal retrieval.
Keywords/Search Tags:Deep neural network, Hashing, Graph convolutional neural network, Cross-modal retrieval
PDF Full Text Request
Related items