Font Size: a A A

Deep Learning-based Fine-grained Cross-media Retrieval

Posted on:2022-01-28Degree:MasterType:Thesis
Country:ChinaCandidate:J M BaiFull Text:PDF
GTID:2518306752496994Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of the Internet,the forms of web data are rapidly increasing,including images,text,video and audio.And the people's demands for cross-media retrieval have become more and more flexible.At present,the research of cross-media retrieval has attracted the attention of many scholars,but the existing cross-media retrieval mainly focuses on coarse-grained cross-media retrieval,and there is less research on the fine-grained crossmedia retrieval,which cannot meet the actual application requirements.As a new research direction,fine-grained cross-media retrieval not only faces the problem of the media gap,but also needs to consider the problems of small differences between subcategories and large differences within subcategories in fine-grained directions.In order to solve these two main problems,this article has carried out in-depth research on fine-grained cross-media retrieval.The main innovations of this article are as follows:(1)Aiming at most of the existing cross-media retrieval methods often ignore the finegrained features of the data,we propose a multi-model network for fine-grained cross-media retrieval method(MMNT).In this method,the proprietary network is designed for different media and a common network is also designed for four types of media,thus taking into account the proprietary and public features of different media.Based on this model,it is possible to learn the correlation between different media while learning the specific attributes of the media,thereby effectively improving the accuracy of cross-media retrieval.(2)Aiming at most of the existing fine-grained cross-media retrieval methods often ignore the semantic information expressed by the text,we propose a deep supervision and feature fusion for fine-grained cross-media retrieval method(DSFF).This method uses the label information and semantic information of the data to learn the correlation between different media features in the label space and semantic space through a deep supervision network,and minimizes the classification loss,discriminant loss and triple loss to eliminate media gaps while retaining the differences in samples of different semantic categories.In addition,this method is based on the combination of label features and semantic features to measure the similarity,which further improves the performance of cross-media retrieval.(3)Aiming at the difficulty in accurately extracting fine-grained features,we propose an attention mechanism and modal dependence for fine-grained cross-media retrieval method(AMMD).The method introduces an attention mechanism and relies on image data as the intermediate medium to deeply explore the potential relationships within the same media data and between different media data.In addition,the method also proposes a key frame-based video denoising analysis method,which obtains a clean data set through the method of sample selection,which improves the accuracy of cross-media retrieval.
Keywords/Search Tags:fine-grained, cross-media retrieval, media gap, feature fusion, attention mechanism
PDF Full Text Request
Related items