Research On Visual Question Answering Method With Visual Content Understanding And Text Information Analysis

Posted on:2021-01-26

Degree:Master

Type:Thesis

Country:China

Candidate:L Y Zhang

Full Text:PDF

GTID:2428330623967786

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of artificial intelligence and machine learning,computer vision tasks and natural language processing tasks have attracted more and more researchers' attention,especially in the field of combining images and words.Among them,Visual Question Answering(VQA)is one of the compelling research areas.As far as computer vision tasks are concerned,research on traditional vision tasks such as object detection and image segmentation has surpassed the human eye's ability to recognize them.These research directions have reached the limit.As far as natural language processing tasks are concerned,traditional tasks such as language translation in this field have also reached human cognitive standards,and development has been limited by bottlenecks.Therefore,researchers have shifted their research focus from the traditional research direction to a combination of image and text research,and the visual question answering system is one of the directions that attracts more and more researchers' attention.Visual Question Answering(VQA)which involves understanding an image and paired questions develops very quickly with the boost of deep learning in relevant research fields such as natural language processing and computer vision.Existing works highly rely on the knowledge of dataset.However,some questions require more professional cues other than the dataset knowledge to answer questions correctly.To address such issue,we propose a novel framework named Knowledge-based Augmentation Network(KAN)for VQA.We introduce object related open-domain knowledge to assist the question answering.Concretely,we extract more visual information from images,and introduce knowledge graph to provide necessary common sense or experience for reasoning process.For these two augmented inputs,we design an attention module that can adjust itself according to the specific questions,such that the importance of external knowledge against detected objects can be balanced adaptively.Extensive experiments on two challenging visual question answering datasets VQA v2 and VQA-CP v2 show that our model achieves the state-of-the-art performance.In addition,our open-domain knowledge is also beneficial to the VQA baselines about question on knowledge.

Keywords/Search Tags:

visual question answering, computer vision, natural language processing, knowledge base, attention mechanism

PDF Full Text Request

Related items

1	Research On Visual Question Answering Method Based On Attention Mechanism
2	Visual Question Answering Based On Object Relationship Modeling And Attention Mechanisms
3	Research On Single-fact Knowledge Base Question Answering Based On Multi-aspect Attention Mechanism
4	Research On Multimodal Interaction Model And Optimization Method For Visual Question Answerin
5	Research Of Visual Question Answering With Capsule Network
6	Deep Convolutional Network And Regional Attention Network For Visual Question Answering
7	Research On Language Ambiguity Elimination Methods In Visual Question Answering (VQA)
8	Research On Visual Question Answering Method Based On Deep Learning
9	Research On Collaborative Attention Model And Deep Correlated Networks For Visual Question Answer
10	Research On Single-relation Knowledge Base Question Answering Based On BERT And Relation-aware Attention