Research On Image Description Based On Multimodal Recurrent Network

Posted on:2019-03-03

Degree:Master

Type:Thesis

Country:China

Candidate:Y W Shu

Full Text:PDF

GTID:2438330551460792

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Visual information plays an important role in the information acquired by human beings.With the improved technology of digital imaging and large capacity storage,digital images have become the most important carrier of vision information,and describing images using sentences has attracted more and more attention.Image captioning needs to not only identify the objects in an image,but also describe properties of these objects and their corresponding relationship.Therefore,the sentence description for an image can contain rich information.In the past,there are many research on image description,such as the methods of traditional template matching and similarity retrieval.Recently,using deep neural networks has become the leading method for image captioning.At present,many novel models for image captioning are put forward.The research work in this paper is based on multimodal recurrent neural network,which includes following two parts:1)A bidirectional multimodal recurrent neural network is proposed.If we unfold the traditional multimodal recurrent neural network at different time steps,it is easy to find that the words generated at each step are related to the words before them.However,every word in the sentence is related to not only the preceding words,but also the words behind them.The proposed bidirectional multimodal recurrent network is trained based on bidirectional statement sequence,and the final description is chosen according to the loss function.Experimental results demonstrate that the improved model can get better performance.2)The performance of the model can get improved with spatial and textual features.Image features are sent into the multimodal recurrent neural network at different time steps directly.However,different weights can be given to each region of an image to represent the difference of attention.In addition,image features can be combined with textual features at different time steps in order that the image features become time-dependent.The results get further improved based on features fusion.The sentences generated by the improved models show that the proposed methods are effective.

Keywords/Search Tags:

image captioning, multimodal recurrent neural network, bidirectional statement sequence, spatial features, textual features

PDF Full Text Request

Related items

1	Research On Image Description Method Based On Multimodal Recurrent Neural Networks
2	The Research Of Dimensional Speech Emotion Recognition Based On Neural Network And Fusion Features
3	Extracting High-level Multimodal Features
4	Research On Visual And Textual Images Retrieval Methods Based On Extracting Salient Visual And Textual Features
5	Research On Fusion Of Multi-level Image Features For Image-text Matching
6	The Design And Implementation Of An Automatic Image Captioning System Based On Deep Neural Networks
7	Research On Semantic Relation Classification Based On Context-Aware Neural Networks
8	Research And Application Of Image Features Based On Neural Networks
9	Research On Image Captioning Algorithm Based On Encoding And Decoding
10	Image Captioning Based On Deep Recurrent Convlution Network And Spatio-temporal Information Fusion