Font Size: a A A

Research And Application Of Deep Learning Based Image Caption Model

Posted on:2019-10-06Degree:MasterType:Thesis
Country:ChinaCandidate:J N GuanFull Text:PDF
GTID:2428330590973943Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,image caption has become one of the hot topics in the new research area.Studying Image caption mainly solves the problems how to understand an image content and generates text for it by machine.However,image caption is often disturbed by non-salient information such as the background information of the image,that makes the image caption prone to deviation.In this paper,a dense-attention image caption model is proposed.Faster RCNN is employed to extract image features as encoding layer,the dense-attention model LSTM-Attend is used to decode and to generate description text,besides,the parameters of the model are optimized by using strategy gradient optimization in reinforcement learning.The model is used in conventional image data.The experimental results show that the model has good ability of image understanding and caption generation,and the effect of caption generation is better than the state-of-the-art models.In addition,since the information may be forgot and lost in the process of deep learning training for medical image caption,we built a multi-modal aggregation layer to effectively fuse medical image information and text information.And,an image caption method based on Repeated Review method is proposed.With the encoder-decoder framework,medical images are abstracted as vectorized expressions to be the initial vectors of LSTM in the decoding layer.Meanwhile,in the process of decoding,multi-modal aggregation is adopted.Our model is verified by experiments on the X-ray medical image data set and the results show that our model is better than other popular models in several benchmarks.Two models of image caption proposed in this paper are verified on regular image datasets and medical image datasets respectively.The experiments show that dense-attention method can effectively avoid the interference of non-salient information in encoding layer and selectively output description for decoding process.The method of repeated review and multi-modal layer can effectively fuse information,which can significantly improve the overall performance.
Keywords/Search Tags:faster rcnn, image caption, attention model, multi-mode layer, bidirectional recurrent neural network
PDF Full Text Request
Related items