Font Size: a A A

Research On Cross-modal Hashing Algorithms For Large-scale Multimedia Retrieval

Posted on:2021-01-07Degree:MasterType:Thesis
Country:ChinaCandidate:Z J ShenFull Text:PDF
GTID:2428330614450002Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Cross-Modal retrieval aims to provide flexible retrieval across different types of multimedia data(such as image,text or video).Compared to traditional uni-modality retrieval tasks,such as image-to-image retrieval,cross-modal retrieval enables more adaptable retrieval experience,like using a video to retrieve its detailed textual explanation.Cross-modal retrieval is a challenging problem,since data from different modalities typically have different statistical properties and are improper to compare directly,which is usually referred to as heterogeneity gap.To solve this problem,most of methods try to project data from different modalities into a common space.With the emerging of big data,existing cross-modal retrieval methods usually suffer from serious high computation and storage cost problem.Cross-modal Hashing(CMH)is then proposed to confront with scalability issue,which integrates hashing technique to learn compact hash codes for different modalities.In this work,we intent to propose more efficient cross-modal hashing methods.First,we propose a new method called “Semi-supervised Graph Convolutional Hashing Network”(SGCH).As we know,most of traditional cross-modal hashing methods are supervised,which performs better but also requires tedious human effort to label the training data.On the contrary,semi-supervised CMH methods which leverage both labeled and unlabeled data become more practical in real applications.In this work,we first model different modalities as graph structures,and use graph convolution to preserve the high-order intra-modal similarity,as well as propagating semantic information from labeled samples to unlabeled data.Then,we use a siamese network to project the learnt graph representations into compact hash codes.To further bridge the inter-modality gap,adversarial loss which aims to learn modalityindependent features by confusing a modality classifier is incorporated into the overall loss function.Extensive results on large-scale multimedia datasets NUS-WIDE-10 K and Wiki demonstrate the effectiveness of SGCH.Then,considering that graph structure is of vital importance for the final retrieval performance,we further propose a new methods called “Adaptive Semi-supervised Graph Convolutional Hashing”(ASGCH).ASGCH utilize Graph Sage algorithm to learn graph representation for different modalities,which is more flexible for largescale graph convolution.Meanwhile,ASGCH construct a semantic classifier to predict labels for unlabeled data,and then add the most confident predictions to the labeled dataset.At the same time,ASGCH utilizes the predicted labels to reconstruct the graph structures for different modalities recursively.Experiments on three real-world datasets including MIRFLICKR-25 K,NUS-WIDE-10 K and Wiki proves the outstanding performance of ASGCH than the state-of-the-art methods.
Keywords/Search Tags:Cross-Modal Retrieval, Hashing Learning, Graph Convolutional Network, Semi-supervised Learning
PDF Full Text Request
Related items