Deep Learning For Image Captioning

Posted on:2019-05-03

Degree:Master

Type:Thesis

Country:China

Candidate:S Liu

Full Text:PDF

GTID:2428330611493339

Subject:Systems Engineering

Abstract/Summary:

PDF Full Text Request

With the development of network communication and multimedia technology,the way people acquire knowledge and communicate with each other is also undergoing earth-shaking changes.More and more multimedia information such as text,images and videos are constantly pouring into people's vision.Image Captioning is a technique for multi-modal processing of image and text,which combines two key areas of computer vision and natural language processing to realize the transformation from image to text.It has many applications such as image retrieval and network image analysis.This paper adopts the framework of encoder-decoder which automatically generates descriptions for a given picture by learning the characteristics of images and the sentences in the data set.The model involves two kinds of deep neural networks,including CNN and RNN,which have been widely used in machine learning in recent years.This paper proposes an adaptive attention mechanism based on text traction.The structure is applied to the model framework based on CNN-RNN and CNN-CNN respectively,so that the model can think like humans and dynamically assign different areas to images when generate related words.The work and research results of this paper mainly include the following aspects:(1)In the task of image captioning,this paper proposes a method to find the representative text-guided feature according to the given image to solve the heterogeneity of underlying features between images and text.Given a query image,the text traction vector is obtained as a bridge between the image and the text modal data by a series of operations: finding the nearest neighbor images,the selection of �consensus sentence� and feature mapping.The text traction vector is a bridge between image and text during the process of image captioning.(2)This paper designs a CNN-RNN framework based on the text traction attention mechanism.The description of an image depends on visual information and language model.In this paper,the text-guided vector is merged into the attention mechanism,so that the decoder can adaptively adjust the visual concentration area,thereby generate more natural descriptions and effectively improve the experimental results.(3)This paper designs a CNN-CNN framework based on the text traction attention mechanism.The parallel computing of the CNN model in the deep learning framework and the advantages of GPU acceleration enable the CNN to utilize and stack multiple network layers instead of circular paths to memorize the context information.The experiment analyzes the influence of layer number and kernel size,and the quality of the generated descriptions,the training and test time of the two model architectures.

Keywords/Search Tags:

text-guided, deep learning, recurrent neural networks, convolutional neural networks, attention mechanism

PDF Full Text Request

Related items

1	Text Classification Based On Deep Learning
2	Research On Multi-scale Text Classification Algorithm Based On Deep Learning
3	Action Recogniton Based On Deep Neural Networks With Visual Attention Mechanism
4	Research On Pedestrian Re-identification Based On Deep Learning
5	Small-scale Facial Expression Recognition Based On Deep Convolutional Neural Networks
6	Deep Learning-Based Methods For Text Detection And Recognition In Natural Images
7	Research And Implementation Of Text Sentiment Analysis Technology Based On Deep Learning
8	Convolutional Recurrent Network For Offline Handwritten Text And Scene Text Recognition
9	Research On Text Classification Model Based On BGRU And Self-Attention Mechanism
10	Text Classification Research Based On Deep Neural Network And Attention Mechanism