Font Size: a A A

Research And Application Of Visual Question Answering Based On Deep Neural Network

Posted on:2020-07-08Degree:MasterType:Thesis
Country:ChinaCandidate:J L LiuFull Text:PDF
GTID:2428330575957062Subject:Intelligent Science and Technology
Abstract/Summary:PDF Full Text Request
With the improvement of deep neural network(DNN)training methods and generalization ability,more large-scale annotated data,and more powerful parallel computing,the technology based on the neural network has achieved subversive development and industrial applications in a single modality(image,voice,text).However,high-level cognitive functions such as multi-modal un-derstanding and human interaction are still weak.To address this problem,this paper studies an extremely important research task in the field of multimodal interaction-visual question answering(VQA).Previous researches are mainly related to using a neural network to fit on a large amount of data.And the designed models lacked enough reasoning ability,interpretability,and general-ization.This paper focuses on designing neural network architectures which fuse the image and question with better integration and reasoning capabilities.The core of this paper is the fusion between multimodality with multi-stage question-and-answer reasoning.Firstly,based on previous research,this paper proposes a Global-Local model that combines multiple image features to solve the problem of multi-feature and multi-granularity feature fusion.Secondly,the proposed mix-order attention mechanism combines the advantages of first-order attention and second-order attention to obtain a better attention mechanism.For multi-stage reasoning,this paper holds that the reasoning ability plays a very important role in the VQA task.We propose a number of deep network structures with reasoning ability,including sequential visual reasoning model,multi-step mixed-order model,and chain of reasoning model.These models verify that constructing explicit reasoning network structures have a positive role in visual question answering.Experimentally results on four large VQA datasets demonstrate that our proposed models achieve the state-of-the-art results.Meanwhile,the paper also uses visualization to verify the partial interpretability of multi-step reasoning.Finally,we implement a VQA demonstration system,so that the public has an intuitive understanding of the current achievements of the VQA models.
Keywords/Search Tags:Visual Question Answering, Deep Neural Network, Multi-modal Fusion, Reason
PDF Full Text Request
Related items