Multimodal Question Answering Over Structured Data With Ambiguous Entities

Posted on:2018-09-19

Degree:Master

Type:Thesis

Country:China

Candidate:H D Li

Full Text:PDF

GTID:2348330512490271

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

In recent years,we have been witnessing profound changes in the way people satisfy their information needs.For instance,with the ubiquitous 24/7 availability of mobile devices,the number of search engine queries on mobile devices has reportedly overtaken that of queries on regular personal computers.In this paper,we consider the task of multimodal question answering,in which a user supplies not just a natural language query but also an image.Our system addresses this by optimizing a non-convex objective function capturing multimodal constraints.Our experiments show that this enables it to answer even very challenging ambiguous entity queries with high accuracy.One important aspect of this is that mobile device usage tends to favor other input modalities than the traditional keyboard and mouse interface,which in previous decades had quite clearly been the most prototypical input devices.Touch interfaces can directly substitute for some of the previous forms of interaction.Yet,typing on mobile devices can be cumbersome,especially on the go,and voice recognition is not always practical in noisy environments.Thus,we have become used to seeing shorter emails with Please excuse brevity-style disclaimers.At the same time,mobile devices open up new opportunities.We now have the ability to instantly snap pictures whenever we encounter something interesting in our daily lives.This has already led to simple photo sharing app startups such as Instagram being acquired for a reported $1 billion.Additionally,touch interfaces and stylus devices also make it easier to sketch things that we are looking for,enabling a novel but little-explored form of information search.Keeping in mind the popular notion of a picture being worth a thousand words,these new modalities may in fact be more than just an alternative.In some cases,humans may have difficulties formulating a natural language query specific enough to make the answer evident.We conjecture that in certain cases,additional multimodal input may aid the user in conveying their search intent to the question answering engine.In this paper,we consider the task of multimodal question answering(QA),in which a user supplies not just a natural language query but also an image to satisfy their information need.The image can be a photographic one or a human-drawn sketch.We focus on questions and answers that can be addressed using structured knowledge repositories.Our system tackles this challenging problem in multiple steps.First,we apply regular linguistic analysis methods to the natural language part of the query.Our experiments show that this component of our system alone already gives us competitive results comparable to those of previous systems.Subsequently,we draw on a novel algorithm based on optimizing a non-convex objective function with linear constraints.This allows us to jointly capture both linguistic and multimodal constraints in a single joint optimization problem.The answer final answers are then retrieved from the knowledge base.

Keywords/Search Tags:

Question Answering, Multimodal, Multimedia Knowled

PDF Full Text Request

Related items

1	Adversarial Multimodal Network For Video Question Answering
2	Research On Generative Question Answering System Based On Multimodal Information Fusion
3	Research On Visual Question Answering Based On Deep Neural Network
4	Multimodal Visual Question Answering Methods Based On Action Semantic
5	Multi-modal Information Fusion In Visual Question Answering
6	Research Of Specific Domain Question Answering System Based On Internet Information
7	Research On The Re-use Of Community Question Answering Knowledge
8	Research And Application Of Key Technologies Of Community Question Answering
9	Research On Key Techniques Of Question Understanding For Open-domain Question Answering System
10	Research On Essay-level Image-text Question Answering