Font Size: a A A

Research On Image Caption Generation Based On Deep Learning

Posted on:2022-02-13Degree:DoctorType:Dissertation
Country:ChinaCandidate:P LiuFull Text:PDF
GTID:1488306551969919Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Image and text are two main kinds of information carriers,in which the former could express information in a vivid method,while the latter has the characteristic of high generality and can transmit information in a concise form.The task of image captioning aims at letting the computer automatically use text to describe the given image.It is widely used in image retrieval,human-computer dialogue,blind navigation,and other applications.In this thesis,we study image caption generation based on deep learning,specifically,our works include global-attention-based neural networks for image captioning,image captioning based on part-of-speech priors,a dual learning method for image caption generation,story generation based on hierarchical topic network,image comment generation based on interweaved hierarchical network.The contributions could be summarized as follows:(1)The approaches for image captioning are easily interfered by irrelevant regions when extracting visual information from the local region features,a new caption generation approach based on a global attention network is proposed.Our method first predicts the probabilities being mentioned for local regions in the image,and then integrate the clues into the visual information extraction process,to help the caption model pay more attention to the most relevant local regions so that more accurate visual information could be provided for the current word generation and give higher quality caption text.(2)Most image captioning approaches utilize the scene graph as image feature representation,but they often ignore the inherent relationship between the node type and part-of-speech of word,a new caption generation approach based on part-of-speech priors is proposed.The proposed approach first predicts the part-of-speech of the word to be generated as prior knowledge,and then use the obtained information to help the computation of weights in the attention module to put more attention on these most relevant regions,for example,when generating adjustive words,more attention would be put on the attribute nodes in scene graph to extract need visual feature,and more accurate information could be provided for word generation.(3)The image caption generation and image generation are studied separately,and ignore the intrinsic relationship between these two tasks,a dual learning method is proposed in this thesis.This approach is built based on the fact that the text generation and image generation can form a closed loop to provide informative feedback to each other.For each one of these two models,a reward obtained and serves as a new optimization target,and guides the model parameter update to obtain a higher reward value.The joint dual learning method proposed in this thesis adopts an iterative learning method to improve the performance of the caption generation model and the image generation model.(4)The sequential text generated by the image caption model is lack intrinsic correlation and can not form a complete story,a story generation method based on a hierarchical topic network is proposed to address the problem.Each text in the story is preplanned with a topic,which is used to guide the story model to generate sentences on a specific topic so that the generated texts are related to each other to form a complete story.(5)The image caption generation model cannot be used to generate comments on shared images in a social chat environment,a new image text generation task,namely image commenting,is proposed.This task requires the comment generation model is required to capture rich contextual information outside the image content,such as the user's emotions,opinions,and common-sense knowledge.This thesis proposes an interweaved hierarchical neural network,which can change between emotional mode and fact mode to generate socially attractive image comment captions.
Keywords/Search Tags:image captioning, attention mechanism, dual learning, story generation, image review, deep learning
PDF Full Text Request
Related items