Font Size: a A A

Generating Image Captions From Structural Words

Posted on:2018-05-04Degree:MasterType:Thesis
Country:ChinaCandidate:S B MaFull Text:PDF
GTID:2348330542484891Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the continuous development of artificial intelligence technology in the field of multimedia.Generating semantic descriptions for images becomes more and more prevalent in recent years.Sentence which contains objects with their attributes and activity or scene involved is more informative and able to express more details of image semantic and easy to understand.Based on the above requirements and artificial intelligence technology,in particular,deep learning technology become more and more prevalent.In this paper,we focus on the generation of descriptions for images according to the structural words we have generated,i.e.,a tetrad of <object,attribute,activity,scene>.Therefore,we use a two-step framework to generate informative image descriptions.At the first step,we propose to use a multi-task learning method to recognize structural words <object,attribute,activity,scene>.At the next step,taking the words sequence as source language,we train a LSTM encoder-decoder machine translation model to output the target caption.Especially,the description is composed of objects with attributes,such as color,size,and corresponding activities or scenes involved.Meanwhile,in order to demonstrate that using multi-task learning method to generate structural words is effective,we do experiments on benchmark datasets,i.e.,aPascal and aYahoo.We also use UIUC Pascal,Flickr8 k,Flickr30k,and MSCOCO datasets to justify that translating structural words to sentences achieves promising performance compared to the state-of-the-art methods of image captioning in terms of language generation metrics.
Keywords/Search Tags:Image Description, Structural Word, Multi-task Learning, LSTM, Machine Translation
PDF Full Text Request
Related items