Font Size: a A A

Question-Guided Attention Reasoning Mechanism For Visual Question Answering

Posted on:2021-05-15Degree:MasterType:Thesis
Country:ChinaCandidate:X WanFull Text:PDF
GTID:2428330623468547Subject:Engineering
Abstract/Summary:PDF Full Text Request
The combined impact of new computing techniques with an increasing of large datasets,is transforming the research direction of many field.Techniques developed with deep learning now are being widely used in the fields of both natural language processing(NLP)and computer vision(CV).On some single-modal tasks,the performance of deep learning models even exceeds the performance of humans.Therefore,many multi-modal tasks,such as visual question answering(VQA),have attracted the attention of many researchers.Given an image and an image-related question,the VQA models need understand and fuse the information of these two modalities,and finally generate an answer.Existing approaches improve model's reasoning ability by stacking attention mechanism without considering the guiding role of the problem in answering process.Therefore,we propose a problem-guided visual reasoning cell,which uses memory to store the image information we need.First,we generate a command from problem by a command generation module.Second,a visual attention mechanism is used to extract the command-related visual regions.Third,we update memory of the cell by the extracted regions.Experimental results on VQA2.0 dataset shows that our model outperforms several fusion based techniques in VQA.Although visual attention mechanism focuses the image on a significant area,it's insufficient to understand the relationship between objects,which is often required when answering complicated question.In this paper,we use the question-guided graph attention network to capture contextual information between the objects in image.Each node,which represents an object,is updated through iterative message passing conditioned on the command extracted from command generation module.Our approach shows its superiority to attention mechanism methods on VQA2.0 dataset and GQA dataset,and outperforms several state-of-the-art techniques.
Keywords/Search Tags:Visual Question Answering, Attention Mechanism, Graph Neural Networks
PDF Full Text Request
Related items