Research And Application Of Deep Learning Based Image Caption Model

Posted on:2019-10-06

Degree:Master

Type:Thesis

Country:China

Candidate:J N Guan

Full Text:PDF

GTID:2428330590973943

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

In recent years,image caption has become one of the hot topics in the new research area.Studying Image caption mainly solves the problems how to understand an image content and generates text for it by machine.However,image caption is often disturbed by non-salient information such as the background information of the image,that makes the image caption prone to deviation.In this paper,a dense-attention image caption model is proposed.Faster RCNN is employed to extract image features as encoding layer,the dense-attention model LSTM-Attend is used to decode and to generate description text,besides,the parameters of the model are optimized by using strategy gradient optimization in reinforcement learning.The model is used in conventional image data.The experimental results show that the model has good ability of image understanding and caption generation,and the effect of caption generation is better than the state-of-the-art models.In addition,since the information may be forgot and lost in the process of deep learning training for medical image caption,we built a multi-modal aggregation layer to effectively fuse medical image information and text information.And,an image caption method based on Repeated Review method is proposed.With the encoder-decoder framework,medical images are abstracted as vectorized expressions to be the initial vectors of LSTM in the decoding layer.Meanwhile,in the process of decoding,multi-modal aggregation is adopted.Our model is verified by experiments on the X-ray medical image data set and the results show that our model is better than other popular models in several benchmarks.Two models of image caption proposed in this paper are verified on regular image datasets and medical image datasets respectively.The experiments show that dense-attention method can effectively avoid the interference of non-salient information in encoding layer and selectively output description for decoding process.The method of repeated review and multi-modal layer can effectively fuse information,which can significantly improve the overall performance.

Keywords/Search Tags:

faster rcnn, image caption, attention model, multi-mode layer, bidirectional recurrent neural network

PDF Full Text Request

Related items

1	Image Chinese Caption Generation Based On Visual Attention And Topic Model
2	Research On Image Caption Based On Object-Attention Model
3	Image Registration Algorithm Based On Faster RCNN
4	Research On Object Recognition And Grasp Based On Faster-RCNN
5	Research On Image Caption Model Based On Deeping Learning
6	An Approach Combined The Faster RCNN And MobileNet For Logo Detection
7	Research On Sensor Activity Recognition Based On Improved Deep Recurrent Neural Network
8	Image Caption Research Using Recurrent Neural Network
9	Mask-RCNN Based Image Chinese Caption Generator
10	Research On Image Caption Generation Method Based On Deep Learning