Font Size: a A A

Visual Question Answering Of Sport Scenes Based On Graph Neural Networks

Posted on:2021-03-12Degree:MasterType:Thesis
Country:ChinaCandidate:J L WeiFull Text:PDF
GTID:2428330611982785Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
The rapid development of related technologies in computer vision and natural language processing has greatly promoted the study of downstream cross-tasks,such as visual question answering(VQA).The goal of VQA is to predict the answer according to the given image and the corresponding natural language question.Compared with static images,dynamic images represented by sport scene have deep semantic information such as action,state and trend,so they are of great research value.The current research mainly explores the visual image information,but ignores the importance of the relationship between words in a question to correctly predict the answer.Therefore,this thesis proposes that the relationship between objects in an image as well as the relationship between words in a problem should be captured simultaneously.Firstly,dynamic images represented by sport scene are constructed to explore the deep semantic information.In this thesis,a two-channel VQA of self-attention is constructed by using the attention mechanism.This benchmark model is used to verify the importance of the relationship between words in a question to correctly predict the answer.Then,the relationship between objects in an image and the relationship between words in a question are captured by using the graph neural network.In this thesis,the VQA models of dual channel graph attention network,dual channel graph convolutional network,and dual channel attention-weighted graph convolution network are designed respectively.In this thesis,a large number of experiments,including comparison experiments,ablation experiments and visualization analysis,have been conducted.The results show that capturing the relationship between objects and words simultaneously is helpful to improve the performance of VQA model,which verifies the effectiveness of the method proposed in the thesis.
Keywords/Search Tags:visual question answering, attention mechanism, graph attention network, graph convolutional network
PDF Full Text Request
Related items