Research On Language Ambiguity Elimination Methods In Visual Question Answering (VQA)

Posted on:2021-01-09

Degree:Master

Type:Thesis

Country:China

Candidate:W Deng

Full Text:PDF

GTID:2438330626964358

Subject:Electronic and communication engineering

Abstract/Summary:

PDF Full Text Request

With the development of multimedia and the Internet,how to deal with massive amounts of image and text information has become a problem that needs to be solved urgently.Therefore,the research on the intersection of computer vision and natural language processing has become the focus of scholars.Among them,the task of Visual Question Answering(VQA)is one of the hot topics of research.The visual question answering task refers to given a question and an image,requiring the machine to answer the question based on the understanding of the image.VQA involves related technologies such as semantic understanding,image detection and recognition,and knowledge reasoning.It requires machines to understand images in a human way and interact with users based on natural language.Therefore,it is very important to improve the intelligence of artificial intelligence systems such as robots.In the past few years,VQA has received extensive attention,so a lot of related work has emerged.Generally speaking,the visual question answering task needs to process the visual information of the image and the text information of the question at the same time,and map the extracted visual features and text features into the same high-dimensional space in a feature fusion manner.This requires the visual question answering model to correctly parse the semantics of the question,so as to give the correct answer combined with the visual characteristicsFor complex questions,due to the existence of language ambiguity,the existing models often bias the capture of text information,which makes it difficult for the existing VQA system to capture the true meaning of the question.When the answer is wrong,humans can try to understand the question in many other ways to get different answers.Inspired by this,this paper proposes a visual question answering method based on yes/no feedback.The specific process is as follows:The method in this paper uses the yes/no feedback mechanism to determine whether the answer is right or wrong for the first time.When the feedback information given by the user is no,our model will re-analyze the question and generate new questions after disambiguation,generating different candidate answers.Then output the highest confidence answer as the final result.This paper compares our method with existing methods on two benchmark datasets CLEVR and CLEVR-Co Gen T.On the CLEVR dataset,the accuracy of this method is close to 100%.On the CLEVR-Co Gen T dataset,the accuracy of this method is 21% higher than the existing method.

Keywords/Search Tags:

Visual question answering, Computer vision, Natural language processing, Syntactic disambiguation, Feedback

PDF Full Text Request

Related items

1	Research On Visual Question Answering Method With Visual Content Understanding And Text Information Analysis
2	Research On Visual Question Answering Method Based On Scene Word Analysis
3	Visual Question Answering Based On Object Relationship Modeling And Attention Mechanisms
4	Research On Multimodal Interaction Model And Optimization Method For Visual Question Answerin
5	Research On Visual Question Answering Method Based On Deep Learning
6	Research Of Visual Question Answering With Capsule Network
7	Research On Visual Question Answering Method Based On Attention Mechanism
8	Research Progress Analysis And Key Information Measurement Of Visual Question Answering
9	Deep Convolutional Network And Regional Attention Network For Visual Question Answering
10	Chinese Auto Question-Answering System For Computer Domain