Font Size: a A A

Global And Local Collaborative Attention For Image And Text Matching Network

Posted on:2021-02-26Degree:MasterType:Thesis
Country:ChinaCandidate:Y S LuoFull Text:PDF
GTID:2428330611465656Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Traditional information retrieval is based on keyword matching or in a single modality.For example,the query set and result set of text retrieval are both in text form,and so is image retrieval.Faced with the growing variety of data on the Internet,this single-modal retrieval method has been unable to meet people's needs,and cross-modal retrieval came into being.The problem of image-text retrieval in this thesis is to switch from single text retrieval or image retrieval to image and text modal retrieval.However,most of the current image-text retrieval algorithms not only have complex and redundant structures,but also often use only a single global(global matching method)or local(local matching method)image-text information when searching and matching,while ignoring the global and local correlation between semantic information.In view of the above problems,this thesis proposes a simple and effective neural network,which can not only effectively align visual and text information,but also the structure in the model can be easily transferred to other models to improve the retrieval ability of the original model.The contributions of this thesis are as follows:1)A global and local collaborative attention network is proposed,which has a simple structure and contains two collaborative attention mechanisms ii-tt and it-ti.The ii-tt collaborative attention mechanism enhances the expression ability of the features by extracting the correlation of global images / text and local regions / words;and the it-ti collaborative attention mechanism can effectively align visual and textual features by using the matching relationship between the global image / text and local words / regions.Therefore,this thesis effectively overcomes the shortcomings of the local matching and global matching methods by using the collaborative attention mechanism to combine the global and local information of the image and text,thereby improving the retrieval ability of the model.2)We improve the mainstream image-text retrieval models SCAN and RDAN,and propose the corresponding b GL-SCAN and b GL-RDAN models.In the global and local collaborative attention network,the collaborative attention structure has good scalability and can be effectively transferred to other image-text retrieval models.When the SCAN and RDAN models perform image and text matching,they mainly focus on matching the local image and text information to the maximum extent,while ignoring the correlation between the global and local semantic information.Therefore,by migrating the collaborative attention structure to the SCAN and RDAN models,this thesis mines the correlation between the global and local information of the image and text and enhances the retrieval ability of the model.3)In this thesis,on the published datasets Flickr30 K and MS-COCO,the effectiveness of the proposed network and the improved model are verified through the comparison of several different experiments.
Keywords/Search Tags:Image-Text Retrieval, Global and Local Semantics, Collaborative Attention
PDF Full Text Request
Related items