Study On Image Captioning Based On Deep Learning

Posted on:2021-01-15

Degree:Master

Type:Thesis

Country:China

Candidate:Z Zhang

Full Text:PDF

GTID:2428330611450559

Subject:Computational Mathematics

Abstract/Summary:

PDF Full Text Request

With Alex krizhevsky put forward the deep convolution neural network model in the2012 ImageNet competition and won the championship in that year,a spree of artificial intelligence research had been stirred up again.As one of the important domains of artificial intelligence,computer vision is also developing rapidly with the establishment of deep learning model.There are a variety of images in modern science and technology life,most of which are not described in specific language.It is easy to understand them for human people,but for machines,it is quite difficult to describe the images comprehensively.Image captioning task is to input a picture and output a natural language description of the picture.It is a combination of computer vision and natural language processing task.That is undoubtedly more challenging than the traditional object detection and segmentation,because the algorithm not only needs to detect objects,but also needs to understand the relationship between objects,and then describe it in natural language.So far,there are still many problems in image captioning:(1)convolutional neural network is the main image feature extraction method in computer vision,but it can't get the relationship between image objects and their hierarchical interaction;(2)recurrent neural networks and its extension LSTM,GRU,etc.have become popular and effective cross domain sequence data modeling framework.In the image captioning task,the sentences of image description generated by recurrent neural network are too simple and there is no reasoning in the generation process;(3)the attributes of images are too few,resulting in the sentences of image description are not specific.The main contents and contributions are summed up as follows:(1)We propose image captioning based on graph neural network(GCN).It considers the hierarchical interaction between different levels of abstract visual information in image and its bounding box.We use GCN in encoder to extract image featureinformation,and then input the extracted information into decoder output the image caption,this model has achieved good results in experiments.(2)Beam search is a kind of approximate reasoning algorithm widely used in decoding sequence of unidirectional neural network model.Because the sentences of the generated image caption are too simple and can't specifically highlight the focus of the image,we use the beam search fusion attention mechanism to generate the image caption.Experiments show that this method makes the image caption task have certain reasoning.(3)For the traditional image caption task output image captioning has no specific description of image content,and the generated statement description is incomplete and single.We use the idea of Generative Adversarial Networks to generate image caption,which makes the generated image caption flexible.The experiment proves the effectiveness of this method.

Keywords/Search Tags:

Deep Learning, Image Captioning, Convolution Neural Networks, Recurrent Neural Networks, Scene Understanding

PDF Full Text Request

Related items

1	The Design And Implementation Of An Automatic Image Captioning System Based On Deep Neural Networks
2	Deep Learning For Image Captioning
3	Image Captioning Based On Deep Recurrent Convlution Network And Spatio-temporal Information Fusion
4	Research On Deep Learning And Visual Attention Technology For Accurate Image Understanding
5	Research On Outdoor Scene Understanding Using Deep Convolutional Neural Networks
6	Research On Image Description Method Based On Multimodal Recurrent Neural Networks
7	Research On Scene Understanding Technology Of Indoor Service Robot Based On Deep Convolution Neural Networks
8	Research On Deep Neural Networks Models For Image Captioning
9	High-level Semantics Based Cross-modality Applications
10	Research On Deep Learning Method Of Dual-channel Convolution Neural Networks