Research On Multi-modal And Multi-Grained Network Based Cross-Media Retrieval

Posted on:2022-11-29

Degree:Master

Type:Thesis

Country:China

Candidate:S J Yuan

Full Text:PDF

GTID:2518306746996249

Subject:Master of Engineering

Abstract/Summary:

PDF Full Text Request

With the application of multimedia equipment and the rapid development of the Internet,a great number of users post multimedia data(such as short videos,images,and texts)on social media platforms.There is an enormous demand for accurately retrieving information from the massive multimedia data,which makes cross-media retrieval receive extensive attention and a profound study in academia.Cross-media retrieval uses one type of media as queries,then returns retrieval results that should have similar semantic information but in different types of media.As different media types have different encodings,it is impossible to directly measure the cross-modal similarity.Therefore,due to the heterogeneous gap,the cross-media retrieval task faces huge difficulties and challenges.Existing research on cross-media retrieval has made great progress,while it still fails to solve the heterogeneous gap problem.Furthermore,existing studies have not deeply studied the positive role of multi-modal and multi-grained data in bridging the heterogeneous gap.In order to reduce the heterogeneous gap,different data modalities can be connected by increasing the cross-modal similarity between different data modalities which have a higher semantic correlation with each other.In addition,as the semantic information contained in multi-grained data is complementary,it is of great importance to narrow the heterogeneous gap through fully mining the complementary information contained in multi-grained data,and further achieve semantic information enhancement.This paper proposes two crossmedia retrieval networks to make full use of multi-modal and multi-grained data:(1)Most of the existing works focus on using single-grained data,and only exploit binary values to distinguish correlations between cross-modal data.To handle these problems,this paper proposes a cross-media retrieval network based on multi-margin triplet loss function and coarse-and fine-grained feature fusion.We divide this network into two parts: Network I,a coarse-and fine-grained feature fusion network based on Deep Belief Networks;Network II,a multi-modal data fusion network based on the multi-margin triplet loss function.We innovatively propose a multi-margin triplet loss function.According to the margins in the margin set,the features belonged to different modalities and different semantic categories are separated from the anchors in a multi-margin manner.The experimental results of compared methods and ablation experiments demonstrate that our proposed method can improve the performance of cross-media retrieval.Particularly,the strategies of fusing coarse-and fine-grained data,and distinguishing irrelevant data in a multi-margin manner are effective.(2)Most of the existing works haven't fully considered the complementary relationship between foreground objects and background information.Although there are several crossmedia retrieval works focusing on coarse and fine-grained data fusion,there is still a lack of research on the fusion of more than two types of data granularities.Hence,this paper proposes a cross-media retrieval network that combines object detection and multi-grained data alignment.The network is divided into two parts,which include object detection subnetwork,and multi-grained sub-networks.First,we use object detection to extract foreground objects,then innovatively build an object detection sub-network.Secondly,we further divide multi-grained data into multi-level fine-grained and coarse-grained data.We use the sliding window strategy to divide images and texts into different fine-grained levels,then construct the multi-level fine-grained subnetworks and the coarse-grained sub-network.Finally,we linearly fuse the similarity matrix of the object detection sub-network and multi-grained sub-networks.This matrix reflects the complementary relationship between foreground objects and multi-level backgrounds.The experimental results of compared methods show that our proposed method can effectively improve the performance of cross-media retrieval.The ablation experiment further proves that each sub-network has a positive effect on enhancing the cross-media retrieval performance.It also confirms the effectiveness of using the sliding window strategy to divide multi-grained data.This paper studies on cross-media retrieval based on multi-modal and multi-grained networks.Firstly,we design a multi-margin triplet loss function to constrain the relationship between multi-modal data.Afterwards,we divide data into multi-grained levels,and explore the complementarity between image foreground objects and multi-level backgrounds.In general,we fully utilize the semantic correlation for multi-modal data relationship modeling,and further mine and fuse the complementary semantic information between multi-grained data.The works we do play an important and positive role in narrowing the heterogeneous gap and effectively improving cross-media retrieval performance.

Keywords/Search Tags:

Cross-media Retrieval, Multi-Margin Loss Function, Multi-Modal Data, Multi-Grained Data, Attention Mechanism

PDF Full Text Request

Related items

1	An Optimized Approach To Cross-Modal Retrieval Based On Multi-level Attention Mechanism
2	Cross-modal Video Retrieval Algorithm Based On Multi-semantic Clues And Metric Learning
3	Research On Classification And Retrieval Techniques For Multi-Modal Data
4	Research On Multimodal Data Modeling And Retrieval For Common Space Learning
5	Research Of Cross-modal Retrieval Methods Based On Deep Learning
6	Cross-modal Retrieval Method Based On Dependence Relationship Attention And Social Information
7	Research On Multi-modal Multi-label Hashing Methods For Large Scale Data Search
8	Research On Depth Visual Attention Method For Multi Class Target Fine-Grained Recognition
9	Research On The Relevance Of Multi-modal Data In Cross-media Retrieval
10	Research On Cross-modal Retrieval Method Based On Deep Semantic Hashing