Font Size: a A A

Research On Essay-level Image-text Question Answering

Posted on:2019-04-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:J Z LiFull Text:PDF
GTID:1368330590951475Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Essay-Level Image-Text Question Answering is a newly proposed cross-domain task.This task combines the Textual and Visual Question Answering tasks and requires the intelligent system to answer the question according to the given image and long text.Compared to the tasks of Textual Question Answering or Visual Question Answering,Essay-Level Image-Text Question Answering is closer to the situation of a person answering questions: a person gets the answer by combining the visual information and the background knowledge.Thus,this task has better developmental prospect of enhancing machine comprehension.Nevertheless,the new task brings new challenges.On the one hand,Textual Question Answering tasks only give paragraph-level texts for background knowledge.Existing methods cannot cope with the long text directly to extract features.On the other hand,Visual Question Answering tasks also process short text and existing multi-modal fusion methods cannot process long text either.Aiming at the characteristics and challenges of the Essay-Level Image-Text Question Answering task,this paper proposes a multi-level solution inspired from existing methods for Textual and Visual Question Answering.The main contributions of this paper are as follows:1.This paper proposes Image-Text Question Answering with Word Embedding to Record the Essay Information,which addresses the long-text problem with the redundant space of the word embeddings.This method embeds the significant essay information into the representation space of the selected keywords of the essay.Without introducing new large structure,the multi-modal fusion problem is addressed based on an existing Visual Question Answering framework in this method.2.This paper proposes Image-Text Question Answering with Essay Recording by Network Embedding under Joint Optimization to address the discordance of the embedding spaces of the keywords and other words.This method smooths the gap between the embedding spaces by network embedding methods with transferability.This method also uses joint optimization ideas to address the problem that network embedding methods easily run into local optima.3.This paper proposes Explicit Reasoning System for Image-Text Question Answering Based on Contradiction Entity-Relationship Graphs that uses discrete structure to represent the image and text.Considering the existing symbolized methods which are powerful at extracting local features but weak at transferring them,this method compares and reasons the explicit features of the image and text with the contradiction semantic which is easy to be transferred.4.This paper proposes Multi-Modal Memory Networks under Instruction from the Contradictions.Leveraging the respective characteristics of symbolized methods and deep neural networks,this method fuses the two aspects based on attention mechanisms and memory networks and utilize both advantages.This paper attempts to explore the Essay-Level Image-Text Question Answering task.The achievements in this paper have certain theoretical meanings and significant reference values for the future development of this new cross-domain task.
Keywords/Search Tags:Image-Text Question Answering, Long Text, Word Embedding, MultiModal Fusion
PDF Full Text Request
Related items