Research On Multi-hop Reasoning Method And Interpretability Based On Graph Network For Visual Question Answering

Posted on:2024-03-07

Degree:Master

Type:Thesis

Country:China

Candidate:C Y Xu

Full Text:PDF

GTID:2568307178491524

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Visual Question Answering(VQA)is an important task in visual language learning,which aims to automatically answer natural language questions based on image content.As a cross-discipline between computer vision and natural language processing,VQA has a wide range of practical applications in areas such as human-computer interaction and intelligent transportation.A key aspect of VQA tasks is the need to reason about the relationships between visual entities in the image and question context.Existing VQA methods are not accurate enough for determining multi-hop relationships between image entities in complex questions and cannot provide a clear reasoning process,resulting in a lack of interpretability of the model.To address the above issues,this thesis applies graph networks to VQA tasks and conducts the following research:(1)To address the challenge of performing multi-hop reasoning to capture the interrelationships and interactions between visual entities when solving complex questions,this thesis proposes a Question-Guided Multi-hop Reasoning Graph Network(QMRGT).It represents the multi-hop reasoning process of visual question answering as question-guided multiple rounds of dynamic interactions and updates between image entities.The network updates the question instructions and visual entity representations in both directions at each reasoning step,and captures the relationships between visual entities on the graph according to a question-guided messaging algorithm,ensuring coherent and consistent multi-hop reasoning for VQA.The interpretability of the method is demonstrated by analysing the weight changes of the question-related visual entities in each reasoning step.(2)To further enhance the robustness of the model’s reasoning ability,and reliability and correctness of the reasoning chain,based on the framework of work(1),this thesis proposes an Adaptive Path Reinforced Reasoning Graph Networks(APRGT).It transforms the multi-hop reasoning process of visual question answering into an expansion task of reasoning paths to learn multi-hop relations between visual entities.Based on a method of self-adaptive expansion of reasoning paths,the network independently explores and expands reasoning paths on the graph according to the question,and realizes accurate and transparent reasoning decisions in the reasoning chain.By analysing the adaptive expansion process of reasoning paths,the method clearly represents a complete reasoning process,further enhancing the interpretability and reasoning robustness of the model.Finally,this thesis conducts a series of comparative and ablation experiments on the public VQA datasets GQA,GQA-OOD and VQA2.0,which verifies the effectiveness of the method in this thesis in VQA tasks,especially the complex question samples which need to perform multi-hop reasoning and question samples for out of distribution generalization.Qualitative experimental analysis also demonstrates that our approach can enhance interpretability.

Keywords/Search Tags:

visual question answering, multi-hop reasoning, interpretability, graph networks

PDF Full Text Request

Related items

1	Visual Question Answering Based On Deep Reasoning
2	Research On Situational Reasoning Visual Question Answering Based On Graph Neural Network
3	Research Of Visual Question Answering Method Based On Deep Learning
4	Question-Guided Attention Reasoning Mechanism For Visual Question Answering
5	Research On Visual Question-Answering Methods Based On Attention Mechanism
6	Research On Key Algorithms Of Visual Question Answering Based On External Knowledge And Semantic Understanding
7	Research On Visual Question Answering Based On Knowledge Graph And Answer Space Optimization
8	Attention Mechanism And High-level Semantics For Visual Question Answering
9	Research On Knowledge Graph Completion Algorithm Based On Multi-hop Relation Question Answering
10	Research On Visual Question Answering Algorithm Based On Spatial Attention Reasoning Mechanism