Research On Visual Question Answering Method Based On Deep Learning

Posted on:2021-09-02

Degree:Master

Type:Thesis

Country:China

Candidate:M Q Jiang

Full Text:PDF

GTID:2518306122468664

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the accumulation of multimodal data and the rapid development of deep learning,the cross-modal learning task represented by visual question answering(VQA)has received extensive attention and research.Visual question answering refers to giving an image and a question in natural language,it requires reasoning over visual elements of the image to infer the correct answer.VQA is a challenging multi-modal learning task since it requires an understanding of both textual and visual modalities simultaneously.Therefore,the approaches used to represent the questions and images in a fine-grained manner play key roles in the performance.To obtain a fine-grained representation,this article designs the end-to-end deep neural network model based on the attention mechanism to jointly learn question and image features.The main work of this article includes:1.In order to solve the problem that the traditional co-attention mechanism that cannot accurately locate the important words in the question and related visual areas in the image,this article proposes the CAQT model.CAQT includes the co-attention mechanism that includes textual attention based on self-attention and question-guided visual attention.The textual attention based on self-attention can find important words in the question and obtain the discriminative question representation.Then,using the question feature to guide the visual attention calculation,the mechanism can locate image areas related to the question based on the text information.Besides,this article introduces question type in the CAQT model and divides the questions in the datasets VQA v1.0 and VQA v2.0 into 8 categories.This article introduces the question type by directly concatenating the one-hot encoding of the question type with the multimodal joint representation,which can make the model know the question type before answer prediction,reduce the search range of the answer,and thus improve the model performance.2.Since the features calculated by the attention module may not be related to the Query involved in the calculation,this article proposes the double attention mechanism.DAtt's attention module consists of two parts: textual-based double attention and visual-based double attention.The double attention mechanism can ensure that the features obtained by attention calculation are related to the Query involved in the attention calculation,and can focus on the input information related to the semantics of the question,thereby reducing the interference of irrelevant information.3.All the methods proposed in this paper are verified on the two benchmark datasets of VQA v1.0 and VQA v2.0.The co-attention mechanism and question type module in the CAQT model can improve the accuracy of answers.the textual-based double attention and visual-based double attention in the DAtt model can also improve the performance of the model.

Keywords/Search Tags:

Visual question answering, co-attention, double attention, selfattention, question type

PDF Full Text Request

Related items

1	Research On Visual Question Answering Based On Visual Attention
2	Research On Collaborative Attention Model And Deep Correlated Networks For Visual Question Answer
3	Research On Visual Question Answering Based On Text Semantic Understanding
4	Question-Guided Attention Reasoning Mechanism For Visual Question Answering
5	Research On Visual Question Answering Algorithm Based On Spatial Attention Reasoning Mechanism
6	Research On Visual Question Answering Models Based On Top-down Attention
7	Research On Visual Question Answering Method Based On Attention Mechanism
8	Dynamic Capsule Attention For Visual Question Answering
9	Deep Convolutional Network And Regional Attention Network For Visual Question Answering
10	Attention Mechanism And High-level Semantics For Visual Question Answering