Font Size: a A A

Research On Semantic Attribute Based Visual Semantic Image Captioning Method

Posted on:2019-03-24Degree:MasterType:Thesis
Country:ChinaCandidate:W WangFull Text:PDF
GTID:2428330572955895Subject:Engineering
Abstract/Summary:PDF Full Text Request
Image contains rich visual information and plays an indispensable role in people's lives.Automatic natural language description for images is one of the key issues towards image understanding.Image captioning,generates natural language descriptions through language model based on the understanding of image context information.It is an extremely challenging task that involves the conversion between visual information and language description modals.This requires the technology fusion of computer vision and natural language processing.In view of the fact that semantic attribute is the bridge between visual information and semantic description,we propose a semantic attribute weighting based deep visual semantic image captioning model based on existing methods,the novelties of our method are summarized as follows.We propose a general semantic attribute based visual semantic image captioning framework.First,we analyze the word frequency of captions in training set,and collect the top1000 words to build an attribute vocabulary.Next,we add a multiple instance learning layer based on weak supervision on a VGG-16 network for semantic attribute detection of image.Then we rank the extracted semantic attributes in descending order of probability,and select the semantic attributes with high probability as general semantic attribute set.Finally,we establish a mapping relationship to weight the image feature by general semantic attributes.This general semantic attribute based image representation is fed into LSTM for image captioning.We propose a semantic attribute weighting based deep visual semantic image captioning framework.First,we use CNN network pre-trained with Image Net dataset and Place365 standard dataset to extract the object-based and scene-based features of given image and combine them into a multiple image feature,then use the multiple image feature and text information build a joint embedding space.Next,we retrieve the captions similar to image in dataset as candidate sentences from multi-feature visual-semantic embedding space,and select the words with high frequency.After that,we re-rank general semantic attributes by the words with high frequency.Finally,we weighting the corresponding image visual feature based on re-ranked semantic attributes,then input the visual feature into LSTM.BLUE4 similarity score is used to supervise the generation of image caption.Results on Microsoft COCO 2014 dataset show that our framework achieves significant performance improvement over other current image captioning methods.Our BLEU4 score and CIDEr socre reaches 0.343 and 1.065 respectively.Our semantic attribute weighting based deep visual semantic image captioning algorithm significantly outperforms the state of the art approaches.Research results can be used for content-based large-scale image retrieval,management,image understanding,visual navigation and so on.
Keywords/Search Tags:Semantic Attribute, Feature Extraction, Cross-Modal Retrieval, Similarity Supervision, Image Captioning
PDF Full Text Request
Related items