Collaborating General And Specific Semantics For Multi-feature Based Image Captioning

Posted on:2020-12-25

Degree:Master

Type:Thesis

Country:China

Candidate:H Liu

Full Text:PDF

GTID:2428330602452521

Subject:Signal and Information Processing

Abstract/Summary:

PDF Full Text Request

With the dramatic increase in Internet bandwidth and various mobile devices,image data has been generated,released and spread in fast speed under the Web 2.0 technology,which has become an indispensable part of today's big data.However,some images on the Internet are untagged,in order to store,manage,retrieve and utilize these data more efficiently,researchers have been worked on automatically describing image contents with complete sentences,viz.image captioning,in recent years.However,image captioning is very challenging.It not only needs to capture the visual representation of objects and scene presented in images and express the relationship between them,but also needs to describe them with appropriate natural language.In address those problems,we systematically study the deep learning based image captioning.The main research contributions are as follows:(1)We propose a LSTM-based image captioning framework to generate sentence sequence from multi-feature sequence.In order to describe the image features more comprehensively,we trained one Res Net152 on Image Net dataset to extract object features and one Res Net152 on Places365 dataset to extract scene context features,respectively.Then,we use these two complementary features to fully represent the object features and scene context features in images.Besides,we use the multi-instance attribute classifiers trained on MSCOCO dataset to extract the semantic information in the image as a supplement of general semantic priors for image captioning.We feed the object features,scene context features and visual semantics sequentially to the encoder of LSTM to complete the feature representation in the image.Finally,the feature is translated into a language description by a LSTM decoder,which is achieved by training the framework of translating multi-feature sequences to the natural language sequences based on the cross entropy loss function.In this paper,we evaluate our model on MSCOCO dataset.The comparison results show the superiority of our algorithm over state-of-the-art approaches on standard evaluation metrics.(2)We propose a multi-feature based image captioning framework that collaborates general and specific semantics.In order to better represent the semantic features of images,we propose to extract the general semantic attributes of image through the multi-instance attribute classifier trained on MSCOCO dataset,and then retrieve similar semantics for the test image in an improved visual semantic embedding(VSE++)space as the specific semantic attributes for the image.Then,we collaborate general and specific semantic attributes as semantic priors,and sequentially feed the collaborated semantic attributes,object features and scene context features to the encoder of LSTM as the feature representation of the image.In addition,we also employ the specific semantics as the �specific semantic supervisor� for BLEU 4 similarity supervision between the candidate phrases in the decoding of LSTM,which results in the captioning method collaborating specific semantics supervision and general semantics.The evaluation on MSCOCO dataset shows the superiority of our model,which achieves better experiment results over the state-of-the-art approaches.

Keywords/Search Tags:

Image captioning, convolutional neural network, long short term memory, cross-modal retrieval, general semantics, specific semantics

PDF Full Text Request

Related items

1	Recommendation Research Based On Image Semantics
2	Research On Image Captioning Algorithm Based On Deep Learning
3	Research On Intelligent Semantics Generation For Visual Data
4	Image Captioning Based On Attention Long Short-Term Memory Network
5	Research On Image Captioning Algorithm Based On Deep Learning
6	Audio-Video Based Cross-modal Speaker Retrieval And Recognition
7	Image-text Translation Based On Cross-modal Related Semantics And Attention Mechanism
8	Study On Image Captioning Based On Spatial Topological Relationship
9	Research On Image Caption Method Based On High Level Semantic Extraction And Attention Mechanism
10	Research On Image Captioning Models Based On High-Level Semantics