Research On Multi-Scale Fusion Cross Modal Retrieval Based On Deep Learning

Posted on:2022-11-08

Degree:Master

Type:Thesis

Country:China

Candidate:K Q Zhao

Full Text:PDF

GTID:2518306743474194

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the explosive growth of multi-modal data in recent years,how to establish semantic associations on multi-modal data for better management has become a hot topic of deep learning research.Cross-modal retrieval has applications in image text retrieval,sketch search,recipe retrieval.Because the multi-modal data have different representations and underlying constructs,it is difficult to measure the similarity between multimodal data directly.In this paper,cross-modal retrieval is implemented from both common space learning and association learning respective,and accomplish the following work.We propose a method named Dual-Scale Similarity with Rich Features for CrossMedia Retrieval(DSRF),which fuses the similarity of category labels with the similarity of contained objects to consider the similarity of multimodal data.Most existing methods map data of different modalities into a common space utilizing category labels and pairwise relationships,however,other discriminative information contained in multimodal data is ignored.In this paper,results that belong to the same category as the query sample but contain fewer identical objects get an appropriate penalty,while correct results(with the same labels and contains many identical objects)get more rewards.In addition,a new semantic feature extraction frame is designed to provide rich semantic information.Multiple attention maps are created to obtain multiple semantic features.Distinguishing from other works that cumulatively average multiple semantic representations,LSTM only with forgetting gates is used to eliminate redundant information.Specifically,forgetting factors are generated for each semantics,and unimportant semantics will be assigned a larger forgetting factor.The m AP scores and R@K scores are increased on MSCOCO,which improve the retrieval accuracy significantly.A multiscale alignment cross-modal retrieval method(MACMR)is proposed,which measures the relevance of multimodal data in terms of fusion alignment at three levels: global,local object,and action-position relationship.Most of the existing works focus on the alignment at the global level or local level,and these methods ignore the information of relationships(location-action)between locally significant regions.The relational-level(action and location)between multimodal data is very important for II cross-modal retrieval.In this paper,relationship-level alignment is added to the global and local-level alignment.Specifically,a cross-modal multi-path network is constructed to extract relevant information from global,local,and relational levels respectively.Obtaining object regions based on target detection takes the intersection regions between objects as relationship regions,aligns object regions and relationship regions with corresponding descriptors in text data.The image regions and text keywords that can't be matched are removed by a joint attention mechanism,achieving better crossmodal retrieval of image text by aligning image and text data at three scales adaptively.Extensive experiments are conducted on the MSCOCO dataset,which improve the R@K score significantly.

Keywords/Search Tags:

Common Space Learning, Correlation Learning, Cross-media Retrieval, Multi-scale Fusion, Semantic Feature Extraction

PDF Full Text Request

Related items

1	Research On Cross-media Retreval Of Online Product Based On Feature Learning And Association Learning
2	Correlation Mining Based Cross-media Retrieval
3	Research Of Cross-media Retrieval Based On Probabilistic Method
4	Research On Image-text Retrieval Method Based On Deep Learning
5	Research On Cross-media Retrieval Algorithm Based On Discriminative Subspace Learning
6	A Cross-Modal Multimedia Retrieval Method Research Based On Deep Learning And Centered Correlation
7	Research On Cross-media Retrieval Based On Correlation Analysis Of Sparse Coefficients
8	Research On Multimodal Data Modeling And Retrieval For Common Space Learning
9	Semantic Classification And Retrieval For Cross-media Data
10	Research On Key Techniques Of Content-based 3D Model Retrieval