Font Size: a A A

Image Caption Based On Multimodal Attention

Posted on:2020-07-19Degree:MasterType:Thesis
Country:ChinaCandidate:G W ChenFull Text:PDF
GTID:2428330599952935Subject:engineering
Abstract/Summary:PDF Full Text Request
Image caption is a fundamental task for connecting computer vision and natural language processing,and it's applied extensively to the field of artificial intelligence.Existing image caption algorithms usually extract features by leveraging transfer learning,and only the visual features are employed to generate the descriptions,making the generated descriptions are inaccurate and not rich.In addition,the image caption algorithms based on the attention mechanism are complex structure and difficult to train.In this paper,we propose an image caption algorithm based on multimodal attention.We firstly construct the keyword category and the keyword dataset based on the image caption dataset,and the keyword-based image feature extraction model is trained on the keyword dataset to extract more accurate image features.Then we propose two independent image caption algorithms based on keyword attention and spatial attention,respectively.The spatial attention-based image caption algorithm generates descriptions by utilizing high-level features while the keyword attention-based image caption algorithm leverages the keyword guidance for generating descriptions.Finally,we propose an image caption algorithm based on multimodal attention by combing the spatial attention-based and the keyword attention-based image caption algorithms.Specifically,the spatial attention is leveraging to obtain better visual features and the keyword attention is employed to guide the generation of the descriptions.To demonstrate the effectiveness of our approach,extensively experiments are conducted on the Microsoft COCO dataset.The results demonstrate that our algorithm can obtain more accurate and rich descriptions,and significantly outperforms all the compared state-of-the-art image caption methods.
Keywords/Search Tags:Image Caption, Computer Vision, Natural Language Processing, Multimodal, Attention Mechanism
PDF Full Text Request
Related items