Research On Image Captioning Based On Neural Network

Posted on:2021-06-20

Degree:Master

Type:Thesis

Country:China

Candidate:M L Zhu

Full Text:PDF

GTID:2518306476953369

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Image caption is exactly as what its name implies.Given an image,the computer automatically generates text that describes the content of the image.This task is easy for humans,but very challenging for machines.It requires the use of computer vision and natural language processing to convert image content to descriptive text.Image caption has a wide range of application scenarios,and the application prospect is huge.It is applicable to the fields such as human-computer interaction,image indexing,intelligent monitoring,video annotation,visual assistance,and so on.In recent years,the application of Encoder-Decoder framework based on deep learning for image caption task has made significant progress.Recently,several studies report that the caption model based on self-attention achieved the state-of-the-art results.Compared with the traditional recurrent neural network(RNN)based models,the self-attention based model solves the time dependency problem through the attention mechanism,so it can perform efficient parallel training,and also achieve better performance in context modeling.However,self-attention requires computation quadratic in its sentence length.This thesis mainly studies and explores the image caption method based on Encoder-Decoder framework,combined with deep neural network technologies.The main work and contributions of this dissertation are as follows:1.An image caption model based on lightweight convolution and dynamic convolution structure was proposed.We apply lightweight convolution and dynamic convolution to image caption task as an alternative architecture to the self-attention to decrease the computational cost from O(N~2)to O(N)where N is the sentence length.2.A set of adaptive attention mechanism strategies was proposed to guide the model to extract the image features of different positions at different times.The model can also decide whether to use the visual information or the semantic information of the generated text to predict the current word.We further enhances the performance of the attention module by adding two-dimensional position information of image features.3.The proposed model was evaluated on the MSCOCO dataset.As a baseline,the CNN based model was used.Another baseline was self-attention based model.The results showed the effectiveness of our model.The proposed model achieves better performance than CNN based models,and is competitive to state-of-the-art self-attention based model.

Keywords/Search Tags:

Image caption, Neural Network, Lightweight Convolution, Dynamic Convolution, Adaptive Attention

PDF Full Text Request

Related items

1	Image Caption Generation With Region Based Attention Scheme
2	Image Caption Algorithm Based On Graph Convolution Networks And Attention Mechanism
3	Research On Lightweight Super-resolution Convolutional Neural Networks
4	Research On Image Caption Model Based On Deeping Learning
5	Research On Technologies And System Of Emotion Recognition Based On Lightweight Skip-Layer Attention Convolution Neural Network
6	The Research Of Image Classification Methods Based On Convolution Neural Network
7	Research On Lightweight Multidimensional Convolutional Neural Networks
8	Research On Dynamic Convolution And Adaptive Learning Rate Based Image Super Resolution Reconstruction
9	Image Classification Based On Lightweight And Multi-scale Attention Fusion
10	Research On Image Text Caption Algorithm Based On Deep Learning