Research On Image Caption Algorithm Based On Visual Relationship

Posted on:2021-10-29

Degree:Master

Type:Thesis

Country:China

Candidate:L Zhang

Full Text:PDF

GTID:2518306047985919

Subject:Master of Engineering

Abstract/Summary:

PDF Full Text Request

In recent years,with the great achievements of deep learning in the field of computer vision and natural language processing,it has become possible for people to use neural networks to describe complex visual concepts.Traditional image caption methods heavily rely on hard-coded visual concepts or specified description templates,and it is difficult to generate diverse image captions.In most image caption methods based on deep learning,convolutional network is directly utilized to encode image into a single feature vector,and then it is inputted to recurrent neural network to generate text description.However,these methods have not fully mined the semantic information in the image,and have not considered the structured information between different regions in the image.As a result,most image caption models generally have lower image understanding and poor scalability.In this paper,the following improvements have been made to the above problems in the entire image caption process.The regional visual features are treated as a regional structured graph to expand information representation form in the image.The dependency between vertexes in the graph is decomposed into conditional probabilities with the aid of the mathematical statistics of the visual relationship triplets in the dataset,and assigning weights to the edges between the vertexes in the regional structured graph.Besides,the graph neural network is used to learn the graph embedding features of the image visual regions.Furthermore,a novel visual relationship detection model based on graph neural network is proposed by combining the semantic labels and location information of image regions.The experimental results show that this method has achieved a large performance improvement on the large-scale dataset Visual Genome in the predicate detection task,and this method has also achieved competitive results in the task of visual relationship detection.Image knowledge in similar scenes is universal on different task datasets,thus the proposed visual relationship detection model is used to extract the visual relationship of the image to share image knowledge.Since the visual relationship triplets in the image cannot be directly used in the process of image caption,a semantic relationship graph is used to represent the visual relationship triplets which are contained in the image.Transformer is regarded as the backbone structure of the image caption model to fuse visual and semantic features.For the regional visual features,the multi-head attention mechanism is used to pay attention to the features of different regions of the image.For the semantic relationship graph,the graph neural network is used to encode the semantic relationship graph into a semantic feature embedding matrix,and double-layer attention mechanism is used to provide guiding semantic information for image caption model.Experiments show that the image caption model using visual features and semantic relationship graph realizes a good performance compared with the mainstream models.In summary,the image caption method based on visual relationship proposed can fully mine the image structured information of the image,solving the problem of the semantic gap between visual information and text information to a certain extent.In addition,it can also achieve the scalability tasks such as scene graph generation and visual relationship detection.

Keywords/Search Tags:

Visual Relationship, Regional Feature Graph, Graph Neural Network, Semantic Relationship Graph, Image Caption

PDF Full Text Request

Related items

1	Image Caption Algorithm Based On Graph Convolution Networks And Attention Mechanism
2	Research On Image Caption Algorithm Based On Graph Convolution Network
3	Scene Graph Generation For Image Semantic Understanding And Represention
4	Research And Application Of News Event Clustering Algorithm Based On Semantic Relationship Graph
5	Research On Situational Reasoning Visual Question Answering Based On Graph Neural Network
6	Graph Neural Network Based Interaction Learning
7	Research On Inter-and Intra-Image Visual Relationship Understanding
8	Research On Financial Relationship Extraction Method Based On Trigger Words And Dependency Syntax
9	Entity-Relationship Extraction Based On Graph Neural Networks
10	Research And Application Of Relationship Extraction Task Based On Deep Neural Network