Font Size: a A A

Research And Application On Topic-specific Image Caption Generation Technique

Posted on:2019-07-25Degree:MasterType:Thesis
Country:ChinaCandidate:C ZhouFull Text:PDF
GTID:2348330545461549Subject:Intelligent Science and Technology
Abstract/Summary:PDF Full Text Request
Image caption generation is a task aiming to generate a textual description for an image automatically.It connects the visual and the textual modality and involving computer vision,natural language processing and many other techniques.While most works on image caption generate a single caption for an image which may be incomprehensive and lack of diversity.This thesis proposes a topic-based multi-caption generator.The details are below.This thesis proposes a topic-based image caption generation technique which infers topics from image first and then generates a variety of topic-specific captions,each of which depicts the image from a particular topic.The model is based on the encoder-decoder framework and consists of three modules:topic extraction and representation,image-based topic distribution prediction and topic-specific caption generation.To dig up the topic information within captions,a topic extraction and representation module is designed.It employs topic model to extract latent topic information and construct the representation of topics.Text-based and multimodal-based topic model are used separately.To narrow the range of topics before generating captions,a topic distribution prediction module is employed to infer topic distribution from image.A classifier is trained with the image representation as input and the topic distribution as output.To minimize the KL divergence of the predicted distribution and the golden one,a topic predictor could be trained.The experiment results show that the classifier predicts at least one topic preciously with probability 0.95 when topic number is set to 20.The topic-specific caption generating module is used to generate captions with the topic restriction.It employs LSTM as decoder.Topic representation is used to supervise and guide the generation of topic-specific caption.The results on three public datasets,flickr8k,flickr30k and COCO,show that the proposed model has better performance than the compared ones.It can generate topic-specific captions with great diversity.The thesis also integrates the technique into a real-time image caption generation application which captures pictures via computer cameras and depicts it dynamically.It's also a proof of the practicality and effectivity of the proposed model.
Keywords/Search Tags:image caption, encoder-decoder, topic model
PDF Full Text Request
Related items