Image Caption Based On Multimodal Attention

Posted on:2020-07-19

Degree:Master

Type:Thesis

Country:China

Candidate:G W Chen

Full Text:PDF

GTID:2428330599952935

Subject:engineering

Abstract/Summary:

PDF Full Text Request

Image caption is a fundamental task for connecting computer vision and natural language processing,and it's applied extensively to the field of artificial intelligence.Existing image caption algorithms usually extract features by leveraging transfer learning,and only the visual features are employed to generate the descriptions,making the generated descriptions are inaccurate and not rich.In addition,the image caption algorithms based on the attention mechanism are complex structure and difficult to train.In this paper,we propose an image caption algorithm based on multimodal attention.We firstly construct the keyword category and the keyword dataset based on the image caption dataset,and the keyword-based image feature extraction model is trained on the keyword dataset to extract more accurate image features.Then we propose two independent image caption algorithms based on keyword attention and spatial attention,respectively.The spatial attention-based image caption algorithm generates descriptions by utilizing high-level features while the keyword attention-based image caption algorithm leverages the keyword guidance for generating descriptions.Finally,we propose an image caption algorithm based on multimodal attention by combing the spatial attention-based and the keyword attention-based image caption algorithms.Specifically,the spatial attention is leveraging to obtain better visual features and the keyword attention is employed to guide the generation of the descriptions.To demonstrate the effectiveness of our approach,extensively experiments are conducted on the Microsoft COCO dataset.The results demonstrate that our algorithm can obtain more accurate and rich descriptions,and significantly outperforms all the compared state-of-the-art image caption methods.

Keywords/Search Tags:

Image Caption, Computer Vision, Natural Language Processing, Multimodal, Attention Mechanism

PDF Full Text Request

Related items

1	Image Caption Generation Based On Attention Mechanism
2	Research On Image Caption Method Based On Attention Mechanism
3	Research On Image Description Generation Based On Visual Attention
4	Research On Image Caption Generation Method Based On Deep Learning
5	Research Of Image Automatically Caption Algorithm Based On Deep Learning
6	Research On Image Caption Generation Based On Deep Learning
7	Research On Multimodal Interaction Model And Optimization Method For Visual Question Answerin
8	Image To Language:Auto Image Captioning Using Bi-directional LSTM And Deep Attention Neural Networks
9	Research On The Generation Method Of Chinese Image Description Based On Dual Attention Mechanism
10	Research On Image Description Generation Algorithm Based On Attention Mechanism