Font Size: a A A

Research On Image Captioning Based On Entity Semantic Information

Posted on:2021-01-25Degree:MasterType:Thesis
Country:ChinaCandidate:J S ZhangFull Text:PDF
GTID:2428330605976505Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Image caption refers to the sentence used to explain the content of the image,and is also called image annotation or image title.The task of image captioning is to realize the automatic generation of image caption.Due to the inaccurate identification of general entities(the objects in the image)and the insufficient information of named entities,the existing methods lead to incorrect recognition of objects in the image or can only generate simple and straightforward captions,which restricts the application of image captioning technology in actual scenarios.Aiming at the above two problems,this thesis proposes an image captioning method based on entity semantic information,mainly including the following three aspects:(1)Research on Image Captioning Based on Bidirectional Attention MechanismIn existing methods,the attention mechanism selects important local region image features based on the semantic information at the current moment,and then the decoder decodes image features into sentence relying on its decipher ability.However,the unidirectional at-tention mechanism is unable to check the consistency of semantic information and image content,so the generated caption is lacking in accuracy.In order to solve the above problems,this thesis proposes an image captioning method based on bidirectional attention mechanism.Attention calculation from image features to semantic information is added,which aims to adjust the semantic information in the decoder according to the image content,thereby generating a more accurate image caption.We conduct experiments on two authoritative image caption datasets,MSCOCO and Flickr30k.Compared with the baseline models,the BLEU-4 score can be increased by 1.5%and 0.8%respectively,and is comparable with international advanced models.(2)Extracting and Filling Method of Person-Type Named Entity for Image CaptioningAn exact caption often contains named entity information.For example,"Messi takes a penalty" can tell the specific protagonist information in the image.However,the existing method produces "a player on the football field" for this image,which is a relatively simple result,although this caption summarizes the theme of the image,it clearly lacks specific objects.In response to this problem,this thesis proposes an extracting and filling method of person-type named entity for image captioning.Specifically,we first generate an initial caption with empty slot(person-type named entities to be filled);then convert the person-type named entity extraction problem into an intelligent question answering problem,and extract the person-type named entity from the relevant documents through the machine reading comprehension model,and fill the empty slot.We crawl(image,caption and related document)triples from Wikipedia to construct the dataset.Experiment result on this dataset shows that the extracting accuracy of person-type named entities reaches 52.31%.Compared with the baseline model,our method can improves 2.93%on BLEU-4 score.(3)Research on Generating Multi-Type Named Entities for Image CaptioningA caption containing multiple named entities is able to convey richer information.For example,"Liu Xiang won the 110-meter hurdles in the 2004 Athens Olympics" includes multiple types of named entity information,such as person,time and event.Therefore,we study the image captioning method that generates caption with multiple types of named entities.Existing methods use a two-stage strategy of first generating a template and then filling in named entities.In this thesis,we transform the problem of acquiring and filling named enities into a generation problem,and the final image caption is directly generated through an end-to-end model.We conduct experiments on the GoodNews dataset,and the experiment result shows that our method outperforms the state-of-the-art model on the BLEU score.In the above three aspects of our research,the first part improves the accuracy of general entity in image captions.The second and third parts realize the generation of image caption containing from only a single type named entity to multiple types of named entities.Our research will improve the application of image captioning technology in actual scenes.
Keywords/Search Tags:Image Captioning, General Entity, Attention Mechanism, Named Entity, Machine Reading Comprehension
PDF Full Text Request
Related items