Research On Image Caption Based On Dual LSTM

Posted on:2019-04-11

Degree:Master

Type:Thesis

Country:China

Candidate:H Z Tao

Full Text:PDF

GTID:2348330542498254

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

In recent years,based on neural network,deep learning has been developing rapidly and showing excellent performance in object recognition and speech recognition,but it is limited to single function in single scene.Image caption is a cross task of computer vision and Natural Language Processing.The purpose of the task is describing the semantic content of the image in natural language.At present,the use of deep learning has achieved some results,but there are still some problems such as poor performance,crude description and lack of semantic information.This article combines image local feature and image global feature,researches and designs a dual Long Short-Term Memory to realize image caption,and proposes a hierarchical attention mechanism to extract image local features.This article designs dual long short-term memory model based on multiscale fusion and bagging ensemble learning algorithm to realize image semantic description,and proposes hierarchical attention mechanism algorithm to extract image local features.The main contents and key points of this article are summarized as follows:(1)Design and implement image caption model based on dual long short-term memory.Bagging is an ensemble learning algorithm connects some parallel model to improve model performance significantly.Multiscale fusion is one of common feature fusion methods to handle image features.This article uses convolutional neural network,long short-term memory to research and design dual long short-term memory model based on multiscale fusion and ensemble learning.Based on image global features,image local features are added,forming multiscale features with parallel connection,enhancing the ability of feature expression,improving accuracy and semantic richness of description.(2)Design and implement the image local feature extraction module.Selective search is a method to figure out proposals based on image underlying features.Attention mechanism is a widely used method in machine learning to obtain image local content dynamically.This article proposes hierarchical attention mechanism based on selective search,attention mechanism and logistic regression classifier to extract high quality proposals and generate more efficient features.Different global distribution will make different local features,then,improve description performance carefully.(3)Discuss fusion operations.Discuss fusion operations between image global features and image local features by experiments based on dual long short-term memory.Analyzing training quota and evaluation quota,then,confirming best fusion operation in dual long short-term memory model and best image caption model.

Keywords/Search Tags:

image caption, convolutional neural network, long short-term memory, multiscale fusion, attention

PDF Full Text Request

Related items

1	Research On Image Caption Via Incorporating Attention And Long Short-Term Memory Network
2	Image Caption Model Based On Feature Extraction Via Dense Convolutional Neural Network
3	Research On Image Caption Based On Attention Mechanism
4	Research And Implementation Of Key Technologies Of Image Caption Based On Deep Learning
5	Research On Image Caption Based On Attention Mechanism
6	Research On Image Caption Based On Object-Attention Model
7	Study On Multi-Topic Based Image Caption
8	Study On Image Captioning Based On Spatial Topological Relationship
9	Image Caption Generator Using CNN And LSTM
10	Chinese Sign Language Recognition Based On Convolutional Network And Long Short Term Memory Network