Font Size: a A A

Research On Visual Reasoning With Graph Structure

Posted on:2022-03-24Degree:DoctorType:Dissertation
Country:ChinaCandidate:G H LiFull Text:PDF
GTID:1488306746956669Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Visual reasoning is an important task for visual-text multi-modal understanding and intelligent assessment.Its goal is to build a question answering system with visual percep-tion,language understanding and reasoning capabilities.Although significant progress has been made in related research,it still faces problems such as inadequate knowledge fusion and weak reasoning ability,which limit the further development of intelligent vi-sual reasoning systems.In order to solve the above challenges,this paper exploits the complementary advantages of graph structure for representing relational data and endow-ing desirable interpretability,thereby studies how to incorporate graph structure repre-sentation in visual reasoning tasks.This will help to break through the existing limita-tions and promote the related research towards more intelligent visual reasoning systems.From the three perspectives of 1)integrating external knowledge graphs to empower tradi-tional models,2)constructing modular reasoning graphs to improve interpretability,and3)learning semantic spatio-temporal graphs to enhance representation ability,this paper proposes three visual reasoning methods based on graph structure.The main contributions of this paper are as follows:· An augmented visual reasoning method based on external knowledge graphs.Most visual reasoning methods rely on analyzing the given images and questionsalone,which fail to answer visual questions that require external knowledge.Someworks try to introduce external knowledge graphs,but they are often difficult toapply to diverse open-domain scenarios.This paper proposes a knowledge graphaugmented model,which explores the relationship between image-question contextentities and answer entities in the knowledge graph space,and enhances traditionaldata-driven methods from a knowledge-driven perspective.Experimental resultsshow that the model effectively improves the performance of the base model undera variety of experimental settings.· An interpretable visual reasoning method based on modular reasoning graphs.Traditional visual reasoning methods usually use a monolithic neural networkmodel,where the reasoning procedure often lacks interpretability.Some recentworks adopt the idea of modularization and use a set of predefined neural mod-ules to replace the traditional monolithic model,but it is still difficult to achieveinterpretable modular reasoning in real-world visual scenes.This paper proposesa novel modular visual reasoning model.By designing a novel group of neuralmodules applicable to the real-world scenarios and a dynamic multi-task optimiza-tion strategy on the reasoning graph(tree),the model successfully decouples thefunctionalities of the neural modules and realizes explainable and compositionalreasoning procedure in the real-world visual scenes.· A self-supervised visual reasoning method based on semantic spatio-temporalgraphs.Traditional visual reasoning models mostly use a series of non-semanticdeep visual features as video representations,and it is difficult to model the com-plex spatio-temporal interactions between objects in multi-object and multi-eventscenarios.This paper proposes a self-supervised visual reasoning method basedon semantic spatio-temporal graphs.By constructing object-level video spatio-temporal graph representations and constructing self-supervised event recognitiontasks,it forms object-level and event-level semantic constraints.Experimental re-sults show that the method exhibits good performance,and achieves a significantaccuracy improvement compared with the baseline methods.
Keywords/Search Tags:Visual Reasoning, Visual Question Answering, Graph Structure
PDF Full Text Request
Related items