Font Size: a A A

Supervised Hierarchical Cross-modal Hashing

Posted on:2022-06-25Degree:MasterType:Thesis
Country:ChinaCandidate:C C SunFull Text:PDF
GTID:2518306608981009Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Recent years have witnessed the unprecedented growth of multimedia data on the Internet,thanks to the flourish of multimedia devices(e.g.,digital cameras and smart mobile devices)that enable people to represent the same instance with different media types,like the text,image and video.In the light of this,given a query(e.g.,a product or a topic),it is highly desirable to retrieve a comprehensive ranking list with rich information on various media types.Accordingly,more sophisticated multimedia similarity search technology merits our special attention.In many real applications,the data we are faced to process is often massive and has a high dimension,which shows the great demand for searching scheme with high speed and low storage cost.In fact,to ensure satisfactory efficiency,the hashing technology has been of considerable interest in the field of similarity search.The goal of hashing technology is to map the high-dimensional data from the original space into a Hamming space with binary codes.At the same time,the similarity and feature distribution between data points in the original space can be well retained in the Hamming space.In this way,both the storage and computational cost can be dramatically reduced,as the Hamming distance reflecting the instance similarity can be efficiently calculated with the bit-wise XOR operations.In this work,we aim to develop more advanced similarity search scheme based on the hashing technology.However,devising more effective hashing scheme for fast multimedia search is non-trivial due to the following challenges.1)Heterogeneous modalities.As the heterogeneous data of different modalities(e.g.,the text,image and video)reside in different feature spaces,how to accurately measure the semantic similarity constitutes the major challenge.2)Joint feature learning.Existing methods mainly rely on the hand-crafted features,like the Scale Invariant Feature Transform(SIFT)and Bag-ofVisual Words(BOVW),resulting in the separate feature learning and hash code learning procedures.Apparently,this pipeline may fail to reach the optimal performance.And 3)effective supervision.As the labels of data points are significant cues that convey the semantic correlation among data points,one crucial challenge lies in how to effectively take these labels into account to supervise the hash code learning and hence promote the retrieval performance.In this paper,we propose a new end-to-end solution for supervised cross-modal hashing,named HiCHNet,which explicitly exploits the hierarchical labels of instances.In particular,by the pre-established label hierarchy,we comprehensively characterize each modality of the instance with a set of layer-wise hash representations.In essence,hash codes are encouraged to not only preserve the layer-wise semantic similarities encoded by the label hierarchy,but also retain the hierarchical discriminative capabilities.Due to the lack of benchmark datasets,apart from adapting the existing dataset FashionVC from fashion domain,we create a dataset from the online fashion platform Ssense consisting of 15,696 image-text pairs labeled by 32 hierarchical categories.Extensive experiments on two real-world datasets demonstrate the superiority of our model over the state-of-the-art methods.
Keywords/Search Tags:Hashing Learning, Label Hierarchy, Cross-Modal Retrieval, Deep Learning
PDF Full Text Request
Related items