Research On Semantic Attribute Based Visual Semantic Image Captioning Method

Posted on:2019-03-24

Degree:Master

Type:Thesis

Country:China

Candidate:W Wang

Full Text:PDF

GTID:2428330572955895

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

Image contains rich visual information and plays an indispensable role in people's lives.Automatic natural language description for images is one of the key issues towards image understanding.Image captioning,generates natural language descriptions through language model based on the understanding of image context information.It is an extremely challenging task that involves the conversion between visual information and language description modals.This requires the technology fusion of computer vision and natural language processing.In view of the fact that semantic attribute is the bridge between visual information and semantic description,we propose a semantic attribute weighting based deep visual semantic image captioning model based on existing methods,the novelties of our method are summarized as follows.We propose a general semantic attribute based visual semantic image captioning framework.First,we analyze the word frequency of captions in training set,and collect the top1000 words to build an attribute vocabulary.Next,we add a multiple instance learning layer based on weak supervision on a VGG-16 network for semantic attribute detection of image.Then we rank the extracted semantic attributes in descending order of probability,and select the semantic attributes with high probability as general semantic attribute set.Finally,we establish a mapping relationship to weight the image feature by general semantic attributes.This general semantic attribute based image representation is fed into LSTM for image captioning.We propose a semantic attribute weighting based deep visual semantic image captioning framework.First,we use CNN network pre-trained with Image Net dataset and Place365 standard dataset to extract the object-based and scene-based features of given image and combine them into a multiple image feature,then use the multiple image feature and text information build a joint embedding space.Next,we retrieve the captions similar to image in dataset as candidate sentences from multi-feature visual-semantic embedding space,and select the words with high frequency.After that,we re-rank general semantic attributes by the words with high frequency.Finally,we weighting the corresponding image visual feature based on re-ranked semantic attributes,then input the visual feature into LSTM.BLUE4 similarity score is used to supervise the generation of image caption.Results on Microsoft COCO 2014 dataset show that our framework achieves significant performance improvement over other current image captioning methods.Our BLEU4 score and CIDEr socre reaches 0.343 and 1.065 respectively.Our semantic attribute weighting based deep visual semantic image captioning algorithm significantly outperforms the state of the art approaches.Research results can be used for content-based large-scale image retrieval,management,image understanding,visual navigation and so on.

Keywords/Search Tags:

Semantic Attribute, Feature Extraction, Cross-Modal Retrieval, Similarity Supervision, Image Captioning

PDF Full Text Request

Related items

1	Cross-modal Retrieval Research Based On Correlation Analysis And Structure Preserving
2	Research On Content Sifting And Storage Mechanism Of Cross-modal Image And Text Data Based On Semantic Similarity
3	Zero-Shot Retrieval Based On Cross-Modal Semantic Guided
4	Research On Hierarchical Supervised Cross-modal Image And Text Retrieval Based On Deep Hashing
5	Collaborating General And Specific Semantics For Multi-feature Based Image Captioning
6	Research On Cross-modal Feature Extraction And Fast Retrieval Algorithm For Geo-images
7	Research On Cross-modal Hash Retrieval Technology For Chest CT Image-text
8	Deep Network For Image-Text Cross-Modal Retrieval
9	Research On Collective Matrix Adaptive Factorization For Cross Modal Retrieval
10	Research On Commodity Image Retrieval Method Based On Cross-modal Technology