Font Size: a A A

Context Based Multi-Image Visual Question Answering (VQA) in Deep Learnin

Posted on:2019-06-07Degree:M.SType:Thesis
University:University of Missouri - Kansas CityCandidate:Peddinti, Sudhakar ReddyFull Text:PDF
GTID:2478390017485284Subject:Computer Science
Abstract/Summary:
Image question answering has gained huge popularity in recent years due to advancements in Deep Learning technologies and computer processing hardware which are able to achieve higher accuracies with faster processing capabilities. Processing image details over natural language information is one of the most challenging tasks in Artificial Intelligence. Most recently, there has been tremendous interest in both creating datasets and proposing deep neural network models for addressing the problem of learning both the images and text information through a question-answering task called Visual Question Answering (VQA). VQA gets us a level closer in terms of human computer interaction through AI. However, VQA is limited in terms of capturing attention only to a certain extent in image (attributes) instead of understanding the semantics of the context in images.;In this thesis, we propose a semantic framework known as Context VQA (CVQA) that aims to extend the existing VQA models in two aspects. First, we built a contextual model for defining the semantics of similar contexts from a multi-image set instead of a single image. In the CVQA framework, a two-stage model was proposed (1) to identify one or more images by mapping the semantic sense of the question to the contextual model built from similar contexts of the images; (2) for the select images, provide the appropriate answer for a given question based on the proposed contextual model. Second, CVQA is an enhancement of one of the VQA implementations (VGG-16), which is extended with a more complex model like ResNet-152, and we analyzed the performance of our CVQA framework on 3 datasets---DAQUAR, VQA version1, and VQA version2. From our experiments, we gained improvement in accuracy and runtime. We also present a CVQA application for context-based visual question answering.
Keywords/Search Tags:VQA, Question answering, Context, Image
Related items