Font Size: a A A

The Research On Methods For Cross-modal Retrieval In The Domain Of Recipe

Posted on:2021-02-24Degree:MasterType:Thesis
Country:ChinaCandidate:J J ChuFull Text:PDF
GTID:2518306122968679Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of artificial intelligence and multimedia technology,people's work,life and entertainment have shown a rich multimodal style in smart terminals,resulting in explosive growth of multi-modal data.These data mainly include text,images,audio,etc.As a result,research on cross-modal retrieval has gradually become a hot topic in the multimedia field.And because different modal data represent different forms,it is difficult for the computer to understand whether the corresponding meanings expressed by different modal data are the same,making cross-modal retrieval also one of the difficulties in multimedia field research.Retrieval of different modal data in recipes is a common application in people's lives.This article mainly studies the cross-modal recipe search for images and texts.It refers to the use of the text in the recipe as a query,the corresponding image from the image candidate list,or the image as the query to retrieve the corresponding text.However,existing cross-modal recipe retrieval methods mainly focus on learning the representation of images and text separately,and then projecting them into a common space for integration.These methods ignore the interaction between the image and text of the recipe,resulting in poor retrieval results.Therefore,the recipe retrieval method based on the joint attention mechanism proposed in this paper has important research significance and application value.The main research work is as follows:(1)Based on the attention network for image and text interactive recipe retrieval,we proposed parallel attention neural network and cross-attention network to study the cross-modal recipe retrieval problem.Specifically,the parallel attention neural network uses the context information of the recipe to learn the attention weight of each part in the image and text separately to better learn the representation of the image and text,and then maps to the public space for mutual retrieval;The cross-attention neural network uses the image of the recipe to guide the attention of the text and the text to guide the attention of the image.The representation of the image and the text of the recipe is obtained and then mapped to the public space for retrieval.These two methods make use of the interactive information between the image and text of the recipe,which can greatly improve the accuracy of retrieval.(2)Recipe retrieval based on fusion network for image and text interaction.Inspired by the effectiveness of the attention network in(1)using recipe images and text interaction information to improve retrieval accuracy,we propose a fusion network to integrate the above two methods Get representations of recipe images and text,and get more accurate representation features.Experiments show that the accuracy of recipe image and text retrieval has been significantly improved in the above two methods.Finally,through experiments on two data sets,we prove the effectiveness and rationality of our proposed solution within the scope of overall performance comparison and micro analysis.
Keywords/Search Tags:Recipe retrieval, Parallel-attention network, Cross-attention network, Fusion network, Cross-modal retrieval
PDF Full Text Request
Related items