Research On Visual Semantic Graph Construction And Its Application In Image Captioning

Posted on:2023-10-30

Degree:Master

Type:Thesis

Country:China

Candidate:W F Xue

Full Text:PDF

GTID:2568307103485774

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

In the era of internet,massive amounts of data are generated by intelligent devices,such as text and image.Image has direct-viewing,easy to understand and other characteristics.Compared to images,text gives a traditional and concise way to express and exchange information.Image captioning is an attractive technology which can automatically generate natural sentences to describe the content of corresponding images.It has been widely used in human-computer dialogue,image-text matching and other applications.Compared to object detection task,image captioning can not only generate the description of objects but also more precisely describe the attributes of objects and relationships between objects in an image.Therefore,how to mine visual semantic words and how to establish their correlation are great challenges in image captioning.The contributions are summarized as follows:(1)We proposed a new semantic graph construction method.There are three kinds of nodes to describe important attributes and relations among objects in the image.Three deep learning based detection models are used to detect and recognize these visual features about objects in the scene.Specifically,Faster-RCNN was used to detect objects in an image.The object word is mapped into the feature embedding vector of the object node in the semantic graph.We use an attribute detector to predict attributes of an object,which is a simple multi-layer perceptron network followed by a softmax function.We used an independent and trainable word embedding layer to encode feature embedding vector as an attribute node.A Bi LSTM was used as a relationship detection model,which can predict the relationship by combining objects and visual region features.Relationship words are encoded with the same way of attribute node.The edge between nodes is represented by the form of a matrix.Specifically,we use object nodes and relationship nodes to construct a three-tuple of matrix.The two-tuple of matrix is composed of object nodes and attribute nodes.The object nodes join the n-tuples to represent the semantic graph.Finally,the graph convolutional network is used to enhance the representation of nodes with visual relationships in the semantic graph.(2)We proposed a semantic sentinel mechanism for image captioning.The sentinel mechanism helps the model to choose visual scene or semantic graph to obtain the next word.Visual features were used to describe some visual related low level features such as some salient regions and spatial information of objects in an image.Semantic graphs show some high-level features such as attributes and relations between objects in detail,which is more consistent with the natural language sentences.At the same time,there is some interfering information in the semantic graph.Therefore,it makes sense for the model to choose visual scene or semantic information to generate the natural sentence.Specifically,the sentinel mechanism is composed of the gating unit of the language model LSTM and the adaptive attention.The gating unit uses the semantic information and the words,memory units of the previous moment of the LSTM to calculate the sentinel vector.The adaptive attention decides to use visual information or sentinel vector to participate in the generation of sentences through the sentinel gate.(3)We conducted extensive experiments to evaluate the application of semantic graphs and sentinel mechanisms.The effectiveness of the three types of nodes in the semantic graph and the semantic sentinel mechanism in image captioning are evaluated by ablation experiments.The performance of the proposed method is tested on the MSCOCO dataset.

Keywords/Search Tags:

Image captioning, Semantic graph, Sentinel mechanism, Long and short-term memory network, Attention mechanism

PDF Full Text Request

Related items

1	Image Captioning Based On Attention Long Short-Term Memory Network
2	Research On Image Captioning Algorithm Based On Deep Learning
3	Research On Image Caption Method Based On High Level Semantic Extraction And Attention Mechanism
4	Image Captioning Based On Adaptive Visual Attention Mechanism
5	Research And Implementation Of Image Captioning Technology Based On Deep Learning
6	Research On Image Captioning Method Based On Deep Neural Networks And Adaptive Attention Mechanism
7	Research On Image Captioning Generation Based On Faster R-CNN And Visual Attention
8	Research On Sentiment Classification Of Comment Aspect Term Based On Graph Neural Network
9	Image To Language:Auto Image Captioning Using Bi-directional LSTM And Deep Attention Neural Networks
10	Research On Image Caption Based On Attention Mechanism