Research On Visual Question Answering Method Based On Graph Convolutional Network

Posted on:2021-05-15

Degree:Master

Type:Thesis

Country:China

Candidate:L Ding

Full Text:PDF

GTID:2558307109959609

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the continuous development of computer vision and natural language processing,visual question answering combining knowledge in these two fields has also developed into an important research direction in the field of computer science.The goal of the visual question answering is to input a given image and question,so that the computer can combine the information contained in the image and text to generate a natural language as the output answer.This task requires multimodal understanding and reasoning ability(image and text).Most visual question answering methods are end-to-end learning systems,which regard visual question answering as a classification task.First,the pre-trained CNN is used to process images and RNN is used to process text,and then the two features are combined through a variety of techniques to predict the answer.The graph network has shown strong ability in classification and reasoning ability,but the two types of Euclidean domain data of image and text can not directly use graph convolutional networks.It is necessary to express image and text features as data of graph structure type.At the same time,the graph network may have oversmoothing problems during training,and the discrimination of nodes in the learning process decreases,which affects the learning effect.In view of the above problems,this paper takes graph convolutional networks as the research object,proposes to use the feature of multiple target instances of the image as the node of the graph,and the Euclidean distance between each node as the adjacency matrix of the graph data.At the same time,the graph network is improved,and self-connections are added in the forward propagation process of the graph convolutional network to enhance the distinction of nodes in the graph,and regularization terms are added to reduce the over-fitting problem.In terms of feature extraction,Faster-RCNN is used to extract the regional features of the image target level,and then Glove is used to encode the problem into a sequence of word vectors,and finally the sequence of word vectors is sent to the GRU to extract the problem features.Visual and text features are fused into graph-type data,and the final answer is classified after learning by graph convolutional layer.In this paper,the method of fusion graph convolutional network is adopted to deal with the visual question answering task.The experiment is carried out in VQA2.0 dataset,and the average answer prediction accuracy is 66.63%.The improved GCN inter-layer propagation method is used to optimize the network,which increases the accuracy rate by 0.21%.Compared with the classical method,this method has higher prediction accuracy,which verifies the effectiveness of the method in the visual question answering task.

Keywords/Search Tags:

visual question answering, graph convolutional network, word embedding, GRU, Faster-RCNN

PDF Full Text Request

Related items

1	Research On Visual Question Answering Based On Graph Neural Networks And Attention Mechanisms
2	Research And Implement For Question Answering Based On Deep Learning And Knowledge Graph Embedding
3	Visual Question Answering Of Sport Scenes Based On Graph Neural Networks
4	Research On Knowledge Question Answering Method Based On Tourism Knowledge Graph
5	Research On Visual Question Answering Based On Deep Learning
6	The Design And Implementation Of Intelligent Question And Answering System
7	Research On Similar Problem Recognition For Question Answering System
8	Research On Deep Learning Algorithm For Automatic Question Answering
9	Research On Affective Visual Question Answering
10	Research And Application Of Visual Question Answering Based On Query Knowledge Embedding And Trilinear Joint Embedding