Font Size: a A A

Multimedia Retrieval-oriented Hashing

Posted on:2020-01-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:L JinFull Text:PDF
GTID:1488306512982409Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The prevalence of social media websites(such as Facebook,You Tube,Instagram etc.)and digital electronic devices(such as digital cameras,mobile phones etc.)has lead to the explosive growth of social media contents including videos,images,texts and so on.When searching the topics of interests for the users,it is especially challenging to return relevant results fast and accurately.The multimedia data is usually represented in the high-dimensional feature space,and the data of different modalities has heterogeneous data structure.Therefore,how to learn compact and discriminative representation for multimedia data becomes a hot research topic in recent years.Hashing technology learns compact binary hash codes by projecting the high-dimensional data into the common Hamming space.Hashing has attracted increasing research attention in recent years due to its high efficiency of computation and storage in approximate the nearest neighbor search.In this thesis,we develop several efficient hashing frameworks for multimedia retrieval by incorporating the hash learning and deep learning.Besides,we show their applications on unimodal and cross-modal retrieval tasks.The main contributions of this thesis are summarised as follows:(1)Most existing deep hashing methods directly learn the hash functions by encoding the global semantic information,while ignoring the local spatial information of images.The loss of local spatial structure makes the performance bottleneck of hash functions,therefore limiting its application for accurate similarity retrieval.In this work,we propose a novel Deep Ordinal Hashing(DOH)method,which learns ordinal representations by leveraging the ranking structure of feature space from both local and global views.In particular,to effectively build the ranking structure,we propose to learn the rank correlation space by exploiting the local spatial information from Fully Convolutional Network(FCN)and the global semantic information from the Convolutional Neural Network(CNN)simultaneously.More specifically,an effective spatial attention model is designed to capture the local spatial information by selectively learning well-specified locations closely related to target objects.In such hashing framework,the local spatial and global semantic nature of images are captured in an end-to-end ranking-to-hashing manner.(2)Most existing cross-modal hashing methods try to preserve the similarity relationship based on either metric distances or semantic labels in a procrustean way,while ignoring the intraclass and inter-class variations inherent in the metric space.This paper proposes a novel crossmodal hashing method,termed as Semantic Neighbor Graph Hashing(SNGH),which aims to preserve the fine-grained similarity metric based on the semantic graph that is constructed by jointly pursuing the semantic supervision and the local neighborhood structure.Specifically,the semantic graph is constructed to capture the local similarity structure for the image and text modality respectively.Furthermore,we define a function based on the local similarity of the semantic graph to adaptively calculate multi-level similarities by encoding the intra-class and inter-class variations.After obtaining the unified hash codes,the logistic regression with kernel trick is employed to learn view-specific hash functions independently for each modality.(3)Deep hashing has received unprecedented research attention in recent years,owing to its perfect retrieval performance.However,most of existing deep hashing methods learn binary hash codes by preserving the similarity relationship while without exploiting the semantic labels,which result in suboptimal binary codes.In this work,we propose a novel Deep Semantic Multimodal Hashing Network(DSMHN)for scalable multimodal retrieval.In DSMHN,two sets of modality-specific hash functions are jointly learned by explicitly preserving both the intermodality similarities and the intra-modality semantic labels.Specifically,with the assumption that the learned hash codes should be optimal for task-specific classification,two stream networks are jointly trained to learn the hash functions by embedding the semantic labels on the resultant hash codes.Different from previous deep hashing methods,which are tied to some particular forms of loss functions,our deep hashing framework can be flexibly integrated with different types of loss functions.In addition,the bit balance property is investigated to generate binary codes with each bit having 50% probability to be 1 or-1.Moreover,a unified hashing framework is proposed to learn compact and high-quality hash codes by exploiting the feature representation learning,inter-modality similarity preserving learning,semantic label preserving learning and hash functions learning with bit balanced constraint simultaneously.(4)Most existing methods on deep cross-modal hashing adopt binary quantization functions(e.g.sign(·))to generate hash codes,which limits the retrieval performance since binary quantization functions are sensitive to the variations of numeric values.Towards this end,this paper proposes a novel end-to-end ranking-based hashing framework in this paper,termed as Deep Semantic-Preserving Ordinal Hashing(DSPOH),to learn hash functions with deep neural networks by exploring the ranking structure of feature dimensions.In DSPOH,the ordinal representation which encodes the relative rank ordering of feature dimensions is explored to generate hash codes.Such ordinal embedding benefits from the numeric stability of rank correlation measures.To make the hash codes discriminative,the ordinal representation is expected to well predict the class labels so that the ranking-based hash function learning is optimally compatible with the label predicting.Meanwhile,the inter-modality similarity is preserved to guarantee that the hash codes of different modalities are consistent.Importantly,DSPOH can be effectively integrated with different types of network architectures,which demonstrates the flexibility and scalability of our proposed hashing framework.
Keywords/Search Tags:multimedia retrieval, hashing, ranking-based hash function, Fully Convolutional Network, Convolutional Neural Network, spatial attention model
PDF Full Text Request
Related items