Font Size: a A A

A News Image Set Captioning Method Based On Clustering And Image-Text Bidirectional Guidance Attention

Posted on:2024-06-12Degree:MasterType:Thesis
Country:ChinaCandidate:X ZhangFull Text:PDF
GTID:2568307178491374Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Image Captioning has always been a research hotspot in the cross field of computer vision and natural language processing,and its goal is to generate text descriptions for an image.In the traditional image captioning task,the existing methods for image captioning only stay at a shallow level,and lack the guidance of real world knowledge.It is difficult to mine the logical semantic relationship of objects in a specific context.The introduction of news text brings new possibilities for image captioning,and requires higher learning ability of models;In addition,there are often multiple images in news data,and they are closely related to each other,which makes the existing single image captioning methods not suitable for news image set captioning task.this paper proposes a news image set captioning method,which takes the image set as the research object and the corresponding news text as the background knowledge to generate explanatory long description text.In order to solve the problem of discomfort of current image-text interaction mechanism when facing multiple images and text learning,image-text bidirectional guidance attention is introduced,and two different forms have been designed: coarse grained image-text bidirectional guidance attention and fine grained image-text bidirectional guidance attention,which can combine external text for bidirectional guidance and attention,and select images and image content from coarse and fine granularity,as well as text information,The model can make full use of graphic information and screen out key information effectively.In order to solve the problem that the implicit relationship between images is not explored in the process of image set captioning,a clustering algorithm is introduced to explore the structured information between images,and the clustering results are combined with image-text bidirectional guidance attention in different ways.This method attempts to use the pointer network,combined with the image-text attention weight,to automatically learn and dynamically adjust the probability of word selection or generation,and guide the generation of named entity words.This paper explores the influence of the location,object of action and combination mode of image-text bidirectional guidance attention and clustering module on the description generation results.Because the existing data set is not suitable for the current task,this paper uses the domestic news image set website as the data source to build a news image set description data set.The experimental results on the news image set dataset show that the introduction of image-text bidirectional guidance attention can effectively improve the quality of the description text,and further improve the description generation effect of the model after incorporating the clustering algorithm,indicating that this research is meaningful and effective.
Keywords/Search Tags:Image Captioning, Image-Text Bidirectional Guidance Attention, Clustering Algorithm, Image Set, News Text
PDF Full Text Request
Related items