Font Size: a A A

Research On Cross-modal Retrieval Technology For News Scenario

Posted on:2024-06-13Degree:MasterType:Thesis
Country:ChinaCandidate:S R YangFull Text:PDF
GTID:2568306944957929Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development and popularity of the mobile internet,massive information is created and presented in multimodal forms.News,as an important literary genre for recording society,disseminating information,and reflecting the times,is a valuable data asset in the digital economy era.How to accurately retrieve the required news content from massive data,regardless of media type,is an important research topic with practical significance.This article focuses on the news scenario and investigates cross-modal retrieval technology between images and texts.The core of cross-modal retrieval is to measure the similarity between heterogeneous data.Contrastive learning,as a weakly supervised learning method,has been widely used in visual-language-related pre-training tasks by learning the metric function through positive and negative samples.However,most models usually assume that the text-image pairs involved in the training have a complete literal alignment relationship,which is inconsistent with the text-image relationship in actual news scenarios.Firstly,there is a large amount of textual information at different levels in news articles,which forms a loose and multi-to-multi alignment matching relationship with images.Secondly,the alignment between text and images in news scenarios is usually weakly literal,and the textual description only corresponds to a part of the image,but is highly correlated semantically.In response to the above two problems,the main work and innovation points of this article are summarized as follows:Firstly,for the weak literal matching relationship between individual sample pairs,this article proposes a joint contrastive learning method based on nearest neighbours.By searching for the semantic nearest neighbours in the text space as the semantic-enhanced guidance for image learning,combined with intra-modal and cross-modal contrastive learning,to retain the semantic proximity relationship within and between multi-modal data.Secondly,for the loose and multi-to-multi alignment between text and images in news articles,a multi-level news text-image contrastive learning method based on text distance is proposed.By constructing text-image sample pairs at the article level,object level,and context level,text-image matching at the article level is learned.At the same time,for the text-image pairs with weak contextual matching,the distance between the text and image in the context is modelled,and the loss is optimized,making the model more consistent with the actual news scenario.Finally,based on the above algorithms,a news multimodal retrieval system is designed and implemented.Combining practical application scenarios,multiple modules including news browsing,data management,and multimodal retrieval are designed,to provide users with multiple news text-image retrieval functions and meet their needs for multimodal data retrieval.
Keywords/Search Tags:contrastive learning, cross-modal retrieval, texts and images of news, semantic similarity
PDF Full Text Request
Related items