Research And Application Of Visual Question Answering Based On Deep Neural Network

Posted on:2020-07-08

Degree:Master

Type:Thesis

Country:China

Candidate:J L Liu

Full Text:PDF

GTID:2428330575957062

Subject:Intelligent Science and Technology

Abstract/Summary:

PDF Full Text Request

With the improvement of deep neural network(DNN)training methods and generalization ability,more large-scale annotated data,and more powerful parallel computing,the technology based on the neural network has achieved subversive development and industrial applications in a single modality(image,voice,text).However,high-level cognitive functions such as multi-modal un-derstanding and human interaction are still weak.To address this problem,this paper studies an extremely important research task in the field of multimodal interaction-visual question answering(VQA).Previous researches are mainly related to using a neural network to fit on a large amount of data.And the designed models lacked enough reasoning ability,interpretability,and general-ization.This paper focuses on designing neural network architectures which fuse the image and question with better integration and reasoning capabilities.The core of this paper is the fusion between multimodality with multi-stage question-and-answer reasoning.Firstly,based on previous research,this paper proposes a Global-Local model that combines multiple image features to solve the problem of multi-feature and multi-granularity feature fusion.Secondly,the proposed mix-order attention mechanism combines the advantages of first-order attention and second-order attention to obtain a better attention mechanism.For multi-stage reasoning,this paper holds that the reasoning ability plays a very important role in the VQA task.We propose a number of deep network structures with reasoning ability,including sequential visual reasoning model,multi-step mixed-order model,and chain of reasoning model.These models verify that constructing explicit reasoning network structures have a positive role in visual question answering.Experimentally results on four large VQA datasets demonstrate that our proposed models achieve the state-of-the-art results.Meanwhile,the paper also uses visualization to verify the partial interpretability of multi-step reasoning.Finally,we implement a VQA demonstration system,so that the public has an intuitive understanding of the current achievements of the VQA models.

Keywords/Search Tags:

Visual Question Answering, Deep Neural Network, Multi-modal Fusion, Reason

PDF Full Text Request

Related items

1	Research And Algorithm Implementation Of Efficient Visual Question Answering Based On Deep Learning
2	Research On Visual Question Answering Based On Deep Learning
3	Multi-modal Information Fusion In Visual Question Answering
4	Research On Visual Question Answering Based On Deep Neural Network
5	Research On Visual Question Answering Method And System Based On Deep Learning
6	Research On Affective Visual Question Answering
7	Relation-based Visual Question Answering
8	Research On Visual Question Answering Technology Based On Knowledge Graph
9	Research And Application Of Key Technologies Of Community Question Answering
10	Enhanced Visual Feature For Visual Question Answering