Research Of Visual Question Answering Based On Image Processing Technology

Posted on:2021-03-27

Degree:Master

Type:Thesis

Country:China

Candidate:D Y Zhang

Full Text:PDF

GTID:2428330623968143

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

The VQA task is to get the correct natural language answer based on the image and the corresponding natural language questions,involving computer vision and natural language processing.Only by multi-modal input and inference,can the task be dealt with.The existing models reasoning by redundant visual features and introduces excessive image noise.It is difficult to tell whether the model has acquired effective representation of image features considering that reasoning tasks are carried out based on low-level semantic features of images.The visual question answering(VQA)system currently mainly evaluates the performance on public datasets.This article will focus on the research of VQA based on mathematical charts to further improve the accuracy of baseline models.Given the above problems and that more structured semantic representation of images is more suitable for inferential scenarios,an object-based VQA reasoning model,which provides an interpretable high-level structured semantic representation of images combined with natural language understanding techniques to accomplish reasoning tasks,is designed.The whole framework is divided into three parts: the visual parser,the question encoder and the general reasoning module.The visual parser completes the detection of the objects in the image by using the object detection(OD)model and obtain the relevant attribute information of the objects by analyzing the detection results.Question encoder map natural language questions to vector space or another representation by recurrent neural network.The general reasoning module combines images and questions representation to complete the inference task.The visual parser is based on the OD model.Therefore,according to the FigureQA dataset features,the model optimization strategy is proposed for Faster-RCNN and RefineDet models.And 91.57% mAP is obtained on the bar chart and pie chart while 78.86% mAP on the line graph.In order to verify the object-based VQA reasoning model,the performance evaluation is conducted on the FigureQA-Microsoft open source dataset,achieving better experimental results than previously known methods and the training time is compressed by 15% of the baseline model.

Keywords/Search Tags:

VQA, Visual Reasoning, Object Detection, Neural Network

PDF Full Text Request

Related items

1	Research Of Visual Question Answering Based On Image Processing Technology
2	Research On Visual Object Tracking Based On Spatial And Temporal Context
3	Video Visual Relation Detection And Reasoning Based On 3D Convolution Neural Network
4	Research On Application Of Convolutional Neural Network In Object Detection Algorithm
5	Fusion And Reasoning Of Video Visual Relation Detection Based On Graph Neural Network
6	Research On Object Visual Relation Detection Algorithm
7	Co-localization Technology Based On Convolutional Neural Network Object Detection
8	Salient Object Detection Based On Deep Learning
9	Visual Tracking Based On Convolutional Neural Network Feature Sharing And Object Detection
10	Research On Visual Object Tracking Algorithm Based On Siamese Neural Network