Font Size: a A A

Research On Image Caption Based On Dual LSTM

Posted on:2019-04-11Degree:MasterType:Thesis
Country:ChinaCandidate:H Z TaoFull Text:PDF
GTID:2348330542498254Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
In recent years,based on neural network,deep learning has been developing rapidly and showing excellent performance in object recognition and speech recognition,but it is limited to single function in single scene.Image caption is a cross task of computer vision and Natural Language Processing.The purpose of the task is describing the semantic content of the image in natural language.At present,the use of deep learning has achieved some results,but there are still some problems such as poor performance,crude description and lack of semantic information.This article combines image local feature and image global feature,researches and designs a dual Long Short-Term Memory to realize image caption,and proposes a hierarchical attention mechanism to extract image local features.This article designs dual long short-term memory model based on multiscale fusion and bagging ensemble learning algorithm to realize image semantic description,and proposes hierarchical attention mechanism algorithm to extract image local features.The main contents and key points of this article are summarized as follows:(1)Design and implement image caption model based on dual long short-term memory.Bagging is an ensemble learning algorithm connects some parallel model to improve model performance significantly.Multiscale fusion is one of common feature fusion methods to handle image features.This article uses convolutional neural network,long short-term memory to research and design dual long short-term memory model based on multiscale fusion and ensemble learning.Based on image global features,image local features are added,forming multiscale features with parallel connection,enhancing the ability of feature expression,improving accuracy and semantic richness of description.(2)Design and implement the image local feature extraction module.Selective search is a method to figure out proposals based on image underlying features.Attention mechanism is a widely used method in machine learning to obtain image local content dynamically.This article proposes hierarchical attention mechanism based on selective search,attention mechanism and logistic regression classifier to extract high quality proposals and generate more efficient features.Different global distribution will make different local features,then,improve description performance carefully.(3)Discuss fusion operations.Discuss fusion operations between image global features and image local features by experiments based on dual long short-term memory.Analyzing training quota and evaluation quota,then,confirming best fusion operation in dual long short-term memory model and best image caption model.
Keywords/Search Tags:image caption, convolutional neural network, long short-term memory, multiscale fusion, attention
PDF Full Text Request
Related items