Research On Essay-level Image-text Question Answering

Posted on:2019-04-08

Degree:Doctor

Type:Dissertation

Country:China

Candidate:J Z Li

Full Text:PDF

GTID:1368330590951475

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Essay-Level Image-Text Question Answering is a newly proposed cross-domain task.This task combines the Textual and Visual Question Answering tasks and requires the intelligent system to answer the question according to the given image and long text.Compared to the tasks of Textual Question Answering or Visual Question Answering,Essay-Level Image-Text Question Answering is closer to the situation of a person answering questions: a person gets the answer by combining the visual information and the background knowledge.Thus,this task has better developmental prospect of enhancing machine comprehension.Nevertheless,the new task brings new challenges.On the one hand,Textual Question Answering tasks only give paragraph-level texts for background knowledge.Existing methods cannot cope with the long text directly to extract features.On the other hand,Visual Question Answering tasks also process short text and existing multi-modal fusion methods cannot process long text either.Aiming at the characteristics and challenges of the Essay-Level Image-Text Question Answering task,this paper proposes a multi-level solution inspired from existing methods for Textual and Visual Question Answering.The main contributions of this paper are as follows:1.This paper proposes Image-Text Question Answering with Word Embedding to Record the Essay Information,which addresses the long-text problem with the redundant space of the word embeddings.This method embeds the significant essay information into the representation space of the selected keywords of the essay.Without introducing new large structure,the multi-modal fusion problem is addressed based on an existing Visual Question Answering framework in this method.2.This paper proposes Image-Text Question Answering with Essay Recording by Network Embedding under Joint Optimization to address the discordance of the embedding spaces of the keywords and other words.This method smooths the gap between the embedding spaces by network embedding methods with transferability.This method also uses joint optimization ideas to address the problem that network embedding methods easily run into local optima.3.This paper proposes Explicit Reasoning System for Image-Text Question Answering Based on Contradiction Entity-Relationship Graphs that uses discrete structure to represent the image and text.Considering the existing symbolized methods which are powerful at extracting local features but weak at transferring them,this method compares and reasons the explicit features of the image and text with the contradiction semantic which is easy to be transferred.4.This paper proposes Multi-Modal Memory Networks under Instruction from the Contradictions.Leveraging the respective characteristics of symbolized methods and deep neural networks,this method fuses the two aspects based on attention mechanisms and memory networks and utilize both advantages.This paper attempts to explore the Essay-Level Image-Text Question Answering task.The achievements in this paper have certain theoretical meanings and significant reference values for the future development of this new cross-domain task.

Keywords/Search Tags:

Image-Text Question Answering, Long Text, Word Embedding, MultiModal Fusion

PDF Full Text Request

Related items

1	A Chinese Question Answering Approach Integrating Count-based And Embedding-based Features
2	Scene Text-based Visual Question Answering With Text Understanding
3	Research And Implementation Of Multimodal Information Fusion Annotation System For Image-Text Mixed Data
4	Research On Text Semantic Similarity Algorithm In Intelligent Question Answering System
5	A Research On Question Answering Algorithm Based On Complex Structured Text
6	Research On Intelligent Question Answering Technology For Long Documents
7	Research On Multimodal Fusion For Visual Question Answering
8	Research On Automatic Text Abstract System Based On Chinese Long Text
9	Research On Domain-Dependent Automatic Question Answering Method
10	High-performance, open-domain question answering from large text collections