Font Size: a A A

Research On Hierarchical Supervised Cross-modal Image And Text Retrieval Based On Deep Hashing

Posted on:2022-03-08Degree:MasterType:Thesis
Country:ChinaCandidate:R D ChenFull Text:PDF
GTID:2518306554971449Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet and the Internet of Things,a large amount of valuable multimodal data are being generated.It is extremely important to find the relevant multimodal information quickly and efficiently in a massive data source,which makes cross-modal retrieval an application scenario and research significance.The cross-modal search aims to provide query data of one modality and return other modal data that are semantically related to it.The majority of existing cross-modal retrieval algorithms are for non-hierarchical supervised information and cannot fully exploit the rich supervised information of the hierarchical labels,do not well minimize the distance between multimodal data with the same semantic information in the common subspace,and do not sufficiently separate the data of different semantic categories.To address these problems,research related to cross-modal image and text retrieval is conducted,including the following three aspects.(1)The problems of current cross-modal retrieval algorithms are that they fail to adequately minimize the distance between multiple modal data containing the same semantic information in the common space,do not adequately consider the inter-layer correlation of hierarchically supervised information,and cannot fully learn the complex inter-layer correlation information.To solve the above problems,the adversarial hierarchical supervised deep hashing for cross-modal retrieval algorithm(AHSDH)is proposed.The algorithm of AHSDH is based on the adversarial concept,where the feature extraction network is used as the generator and the modality differentiation network is used as the adversary,and the two conduct adversarial learning so that different modalities with the same semantics have the closest distance in the common subspace.The intra-label layer similarity loss and inter-label layer correlation loss are also introduced to fully exploit the intrinsic similarity existing in each label layer and the correlation existing between label layers,thus improving the accuracy of cross-modal retrieval.(2)The algorithm of AHSDH utilizes a bag-of-words pattern to denote text modal data,ignoring the semantic relevance in the text.To address the above problem,multiscale feature stacking model-based hierarchical supervised deep hashing for cross-modal retrieval algorithm(MSFSM-net)is proposed.The algorithm extracts text features using a multiscale feature stacking model constructed with different mean pooling layers,which fully considers the semantic relevance of text modality.(3)To address the problem of interference of modal data with different semantic categories on retrieval,different semantic distinctions-based hierarchical supervised deep hashing for cross-modal retrieval algorithm(DSD-net)is proposed.In this paper,different objective functions are set for the same modal and different modal data of different semantic categories,so that the modal data with different semantic categories are kept away from each other in the common space,thus avoiding interference of different semantic categories of data on retrieval.
Keywords/Search Tags:cross-modal image-text retrieval, deep hash algorithm, hierarchical supervision, adversarial network, multiscale feature
PDF Full Text Request
Related items