Font Size: a A A

Research On Unsupervised Hash Retrieval Methods For Multi-modal Data

Posted on:2023-04-26Degree:MasterType:Thesis
Country:ChinaCandidate:X Z WuFull Text:PDF
GTID:2568306614992179Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology and mobile terminals,multi-modal data have shown an explosive growth trend.How to efficiently and quickly retrieve these multi-modal data to meet the increasingly complex multi-modal retrieval needs of users has become an important challenge.Unsupervised hashing technology has good scalability,low storage space and high retrieval efficiency,and it can be effectively used to support the retrieval of large-scale multimodal data.Existing unsupervised hashing methods for multi-modal data mainly include unsupervised cross-modal hashing and unsupervised multi-modal hashing.Unsupervised multimodal hashing methods utilize the complementary information between multi-modal data and fuse them into unified hash codes to represent each multi-media object more comprehensively;unsupervised cross-modal hashing methods map certain modality data into compact hash codes by learning several hash functions,so as to retrieve data from other modalities that are semantically similar to it in database.Although existing unsupervised hash learning methods for multi-modal data have made some progress,they still have some important problems that need to be solved:(1)All existing unsupervised multi-modal hashing methods use matrix factorization to learn hash codes,which shows a limited ability to fuse heterogeneous multi-modal features.In addition,they model semantic relationships by constructing graphs,which actually introduces large computational complexity and space cost.(2)Existing unsupervised cross-modal hashing methods ignore the connection between identity semantics and correlation semantics when modeling and preserving multi-modal intrinsic semantics,and thus simply preserve them separately into the final hash codes.In addition,these methods neglect to preserve the already modeled multi-modal semantics into hash functions at the stage of hash function learning,which reduces the generalization of the hash functions.This thesis proposes the following two unsupervised hashing methods to solve these problems,respectively:(1)This thesis proposes an efficient Multi-modal Discrete Tensor Decomposition Hashing(MDTDH)method.Specifically,the proposed method first exploits non-linear mapping to capture the nonlinear semantic structure in each modality feature,stacks them into a three-dimensional tensor,and then decomposes them into a core tensor and two factors by Tucker2 decomposition.Simultaneously,the proposed method learns a hash function by mapping the non-linear features of the training data to their corresponding hash codes,which can be applied to the large-scale online retrieval of out-of-sample queries.To reduce the quantization loss and computational cost of the model,this thesis proposes a fast discrete optimization strategy to directly generate discrete hash codes,reducing the quantization loss of the model.This thesis conducts extensive performance comparison experiments and ablation experiments on three commonly used datasets,which verifies the superiority of the proposed method from different perspectives.(2)This thesis proposes a Correlation-Identity Reconstruction(cross-modal)Hashing(CIRH)method.Specifically,the proposed method first constructs a multi-modal collaborative graph to effectively model the adjacency relationship between heterogeneous multi-modal data,and preserves the multi-modal intrinsic semantics into hash codes by using the proposed correlation semantic reconstruction and identity semantic reconstruction strategies.In addition,the proposed method proposes a cross-modal semantic aggregation module to explore a shared space,and generates discriminative feature representations by enhancing the information interaction between heterogeneous modalities and mining the complementary information between them.Finally,different from existing methods,this method proposes a correlation-identity semantical consistency hash function learning strategy to preserve the modeled multi-modal intrinsic semantics in the deep hash function networks of each modality.This thesis conducts extensive performance comparison experiments and ablation experiments on three commonly used datasets,which verifies the superiority of the proposed method from different perspectives.
Keywords/Search Tags:Multi-modal data retrieval, Multi-modal hashing, Cross-modal hashing, Semantic preservation, Unsupervised learning
PDF Full Text Request
Related items