Font Size: a A A

Research On Cross-modal Hash Retrieval Algorithm Based On Unsupervised Learnin

Posted on:2024-04-12Degree:MasterType:Thesis
Country:ChinaCandidate:Y C ZhengFull Text:PDF
GTID:2568307148963009Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Cross-modal retrieval aims to retrieve information related to one modality from another modality.With the increase of data,there is a higher demand for information storage and retrieval speed,leading to the introduction of cross-modal hash retrieval technology.However,the natural heterogeneity of different modalities leads to a performance decline when the model trained in cross-modal hash retrieval is used for retrieval,making it crucial to narrow the gap between different modalities in current research.As deep learning technology has achieved great success in representation learning,researchers have proposed unsupervised cross-modal hash retrieval methods based on deep learning to alleviate the heterogeneity of semantic meaning between different modalities and improve retrieval performance.However,these methods still face two problems: first,local features are difficult to describe complex semantic relationships between different entities,resulting in inaccurate expression of semantic information of intra-modal features,which affects semantic relevance between different modalities;second,existing deep learning networks are difficult to capture semantic relationships between different modalities,affecting semantic consistency between different modalities.To effectively solve these problems,this paper proposes two unsupervised cross-modal hash retrieval methods:(1)This paper proposes an unsupervised cross-modal hash retrieval method based on self-attention feature enhancement.To address the problem that local features are difficult to describe complex semantic relationships between different objects,this paper captures longrange dependencies within each modality through multi-head self-attention mechanisms from the perspective of global semantic relationships,establishing global semantic relationships between different semantic features.In addition,this paper introduces adversarial loss in the network to further enhance semantic consistency between different modalities,generating more robust and high-quality hash codes,thus improving cross-modal retrieval performance.The proposed method is validated on three cross-modal hash public datasets,Wiki,MIRFlickr-25 k,and NUS-WIDE.(2)Based on the self-attention feature enhancement unsupervised cross-modal hash retrieval method mentioned above,this paper proposes a dual unsupervised cross-modal hash retrieval method with heterogeneous feature interaction to address the problem that existing deep learning networks are difficult to capture semantic relationships between different modalities.Specifically,based on the semantic consistency between different modality object entities,this paper uses the heterogeneous feature interaction module for feature fusion,especially enhancing the interaction between different modality features through the crossattention module and learning similar semantic relationships between modalities.Finally,this paper designs a dual hash module based on image and text modalities,which extracts two different networks for local and global feature extraction of each modality,not only enhancing local features and global dependencies of different modalities,but also effectively suppressing redundant information between different modalities.A large number of experiments show that the proposed method significantly improves the performance of cross modal hash retrieval.Compared with the state-of-the-art AGCH method,the average m AP@50 of I→T task and T→I task on different length hash codes in MIRFlickr-25 k are improved by 3.5% and 7.9%,respectively.
Keywords/Search Tags:Cross-modal hash retrieval, Heterogeneous semantic gap, Self attention, Feature interaction, Dual hash
PDF Full Text Request
Related items