Font Size: a A A

Research On Image-text Retrieval Based On Attention Mechanism And Adversarial Learning

Posted on:2021-03-07Degree:MasterType:Thesis
Country:ChinaCandidate:Z X MaFull Text:PDF
GTID:2428330623969226Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Image-text retrieval technology plays an important role in the increasingly rich digital world,so it is regarded as a very important technical input by the academic and industrial circles.Based on the advantages and disadvantages of the existing attention mechanism and the structural characteristics of the discriminator in the Generative Adversarial Networks(GAN),this thesis designs the image feature prediction network,stacked cross attention module and feature source discrimination module,and proposes an image-text retrieval algorithm based on stacked cross attention and adversarial learning.First of all,this thesis directly takes the image feature space as the common space,while traditional methods need to learn the common space.In addition,we introduce stacked cross attention mechanism to calculate the fine-grained similarity between the local image and each word.This thesis divides the coarse-grained and fine-grained levels to calculate image-text similarity.Without increasing too much computational complexity,this thesis optimizes the process of image feature prediction from text.This thesis improves two shortcomings of traditional attention mechanism which calculates by exhaustion and can handle only one pair of local image and word in one preset step.Finally,this thesis designs a feature source discriminator to optimize the discriminator loss and generator loss by adversarial learning.Generator loss means the loss of crossmedia similarity,including coarse-grained(global)and fine-grained(local)parts.In this thesis,we test the effect of attention mechanism and adversarial learning on image-text retrieval model in the mainstream cross-media retrieval datasets MS-COCO and Flickr30 k,and make a visual analysis.Experiments show that stacked cross attention has a good ability of fine-grained hierarchical feature matching,and can greatly improve the prediction effect from text feature to image feature,and then improve the accuracy of image-text retrieval.In addition,the experimental results show that the accuracy of retrieval model is also improved a little by adversarial learning.
Keywords/Search Tags:image-text retrieval, common space, attention, generative adversarial mechanism
PDF Full Text Request
Related items