Font Size: a A A

Research On Image Caption Based On Graph Deep Learning

Posted on:2021-05-19Degree:MasterType:Thesis
Country:ChinaCandidate:Z WangFull Text:PDF
GTID:2428330611451396Subject:Software engineering
Abstract/Summary:PDF Full Text Request
As a multimodal task combining computer vision and natural language processing,image caption has been widely concerned.Specifically,the Convolutional Neural Network(CNN)usually is exploited to extract image features and then the features is injected into the Recurrent Neural Network(RNN)as a decoder to generate description for image caption.However,the traditional tasks only focus on how to improve the fine-grained image features and how to enhance the expression ability of the decoder.Most of them ignore the semantic features contained in the input image.At the same time,in the decoding process,there is also a lack of word-level feature input and guidance process.In order to overcome the above problems,this paper has made in-depth research in the process of improving the Encode-Decode framework.This paper starts from the encoder,combines the object detection algorithm to determine the specific object in the image,then explores the interaction between the objects,constructs the corresponding semantic graph,obtains the logical feature representation of the semantic graph through the Graph Convolution Network(GCN),and treat it as semantic information contained in images.But in the process of experiment,it is found that too many objects will cause the performance of the model to decline or even disappear,so this paper designs an effective module with a filtering mechanism to remove redundant objects and proposes the model of graph convolutional network based on the gating filtering mechanism.Secondly,this paper further studies the decoder and design a decoder that can guide the visual region,object,and relationship at the same time.By combining it with graph convolution encoder and image region visual encoder,this paper proposes another model of graph convolutional network based on guidance mechanism.Finally,comparative experiments are designed on MS-COCO.The results show that the two models proposed in this paper are 28.9% and 4.4% higher than the basic methods NIC and Up-Down respectively on Bleu@4,20.4% and 3.4% higher on CIDEr respectively,and the overall performance is far better than other models.The experimental results also fully prove that the research work of exploring image semantic features and guiding semantic information can significantly improve the logicality and accuracy of the description.
Keywords/Search Tags:Image Caption, Graph Convolution Network, Semantic Guidance Mechanism, Deep Learning
PDF Full Text Request
Related items