Image Captioning Based On Self-Attention Network

Posted on:2022-01-08

Degree:Master

Type:Thesis

Country:China

Candidate:Y Z Li

Full Text:PDF

GTID:2558306914962529

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Image captioning task aims to generate descriptive natural language sentences for a given image.This task connects the two fields of computer vision and natural language processing,and is one of the multimodal tasks.Its research progress is crucial to break the semantic gap between images and text.In recent years,with the development of deep learning,the model structure of Convolutional Neural Network as encoder and Recurrent Neural Network as decoder is widely used in image captioning tasks.The inherent sequential structure of the Recurrent Neural Network creates a memory recession problem,resulting in the model focusing on less information above at the current moment than the information above at the previous moment.At the same time,the decoder only uses the visual feature information of the image in the process of decoding,ignoring the spatial information between objects in the image.Based on the above problems,the following work is carried out in this paper.In this paper,an image captioning model of self-attention network is used.The inter-modal attention module is designed and implemented for the problem of weak modal interaction capability of traditional selfattention mechanism.I design and implement the intra-modal attention module to solve the problem that the query vector vanishes when the selfattention mechanism propagates forward in the stacked network.After performance experiments and ablation experiments on MS-COCO and Flickr30K datasets,the effectiveness of the proposed intra-modal attention module and inter-modal attention module is verified by quantitative and qualitative analyses.An image captioning model incorporating spatial information of images is designed.For the lack of spatial relationship between objects in the image during the decoding process of the model,the proposed spatial location coding module based on intersection over union makes the model enrich the feature information of the image.After performance experiments and ablation experiments,the effectiveness of the proposed spatial position coding based on intersection over union is verified through qualitative and quantitative analyses.Based on the above image captioning model,we design and implement an image captioning system.The system supports online model selection and gives generated description results based on the uploaded images.Finally,the function of the system are tested.Extensive experimental results show that the system can generate accurate and detailed description statements according to the image captioning algorithm model.

Keywords/Search Tags:

deep learning, image captioning, self-attention network, spatial information encoding

PDF Full Text Request

Related items

1	Image Captioning Based On Deep Recurrent Convlution Network And Spatio-temporal Information Fusion
2	Research And Application Of Image Paragraph Captioning Based On Relations Encoding And Attention Mechanism
3	Design And Implementation Of Image Captioning Model Based On Deep Learning
4	Research On Image Captioning Algorithm Based On Encoding And Decoding
5	Research On Image Captioning Algorithms Based On Deep Learning
6	Image Chinese Captioning Model Based On Deep Learning
7	Research On Social Image Captioning Based On Deep Learning
8	Research On Image Captioning Method Based On Deep Learning
9	Research On Image Captioning Algorithm Based On Deep Learning
10	Research On Academic Figure Captioning Based On Deep Learning