Font Size: a A A

Study On Deep Cross-modal Retrieval With Preserving Semantic Ranking Structure

Posted on:2022-08-01Degree:MasterType:Thesis
Country:ChinaCandidate:H LiuFull Text:PDF
GTID:2518306536463684Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Compared with single-modal retrieval,cross-modal retrieval can realize mutual retrieval among different modalities,which is more convenient and flexible.However,the massiveness and diversity of multimedia data bring great challenges to cross-modal retrieval.The difference in modality means that the feature representation and feature distribution are different,which cannot be directly compared.Therefore,how to achieve large-scale cross-modal retrieval under the premise of ensuring the diversity and accuracy of retrieval results is a problem worthy of research in the field of multimedia retrieval.This thesis works on the multi-label cross-modal retrieval that includes two modalities of image and text.Using the real-valued common space learning method based on deep learning to alleviate the insufficient use of semantic category information,the insufficient of similarity quantification for multi-modal data,and the unfriendly similarity ranking of common representation in the current multi-label cross-modal retrieval methods,and proposed separately corresponding methods and strategies.The main contents and innovations of this thesis are as follows:(1)An approach of preserving semantic category relationship structure based on graph convolution for multi-modal common representation is proposed.Multiple classifiers that retain the category dependency relationship are obtained from the relationship graph of the category through graph convolutional network,and act on the common representation of multi-modal data,thereby maintaining the relationship structure of the semantic category in the label space,and promoting the common representation to be discriminative,while mining potential associations between samples from the relationship of semantic categories;(2)A similarity measurement method for multi-modal data is proposed.Based on the sharing times of category labels,the similarity between multi-modal data is quantified at multiple levels,and the similarities of visual features and high-level text semantic features are further merged to construct a similarity matrix that distinguishes the degree of similarity;(3)A similarity ranking method for common representation based on the pairwise constraint is proposed.When mapping the original features of the modal data to the common space,dynamic boundary threshold is set based on the similarity matrix to constrain the distance between the common representations to be sorted according to the similarity,thereby retaining the semantic ranking structure of the common representations,and promoting the retrieval results to return in descending order of similarity;(4)Based on the above three points,this thesis proposes a common space learning method with semantic ranking structure preserving(SRSP)for the two modalities of image and text,which is used to construct deep Cross-modal retrieval framework.SRSP enables the common representation to retain the similarity ranking structure,while retaining the semantic category relationship structure on the basis of semantic discrimination.Through ablation experiments and comparative experiments on the two cross-modal retrieval datasets of MS COCO and NUS-WIDE,the effectiveness and superiority of the SRSP method are verified.
Keywords/Search Tags:Cross-modal retrieval, Common space learning, Graph convolutional, Semantic structure preserving
PDF Full Text Request
Related items