Font Size: a A A

Research Of Visual Question Answering Based On Image Processing Technology

Posted on:2021-03-27Degree:MasterType:Thesis
Country:ChinaCandidate:D Y ZhangFull Text:PDF
GTID:2428330623968143Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The VQA task is to get the correct natural language answer based on the image and the corresponding natural language questions,involving computer vision and natural language processing.Only by multi-modal input and inference,can the task be dealt with.The existing models reasoning by redundant visual features and introduces excessive image noise.It is difficult to tell whether the model has acquired effective representation of image features considering that reasoning tasks are carried out based on low-level semantic features of images.The visual question answering(VQA)system currently mainly evaluates the performance on public datasets.This article will focus on the research of VQA based on mathematical charts to further improve the accuracy of baseline models.Given the above problems and that more structured semantic representation of images is more suitable for inferential scenarios,an object-based VQA reasoning model,which provides an interpretable high-level structured semantic representation of images combined with natural language understanding techniques to accomplish reasoning tasks,is designed.The whole framework is divided into three parts: the visual parser,the question encoder and the general reasoning module.The visual parser completes the detection of the objects in the image by using the object detection(OD)model and obtain the relevant attribute information of the objects by analyzing the detection results.Question encoder map natural language questions to vector space or another representation by recurrent neural network.The general reasoning module combines images and questions representation to complete the inference task.The visual parser is based on the OD model.Therefore,according to the FigureQA dataset features,the model optimization strategy is proposed for Faster-RCNN and RefineDet models.And 91.57% mAP is obtained on the bar chart and pie chart while 78.86% mAP on the line graph.In order to verify the object-based VQA reasoning model,the performance evaluation is conducted on the FigureQA-Microsoft open source dataset,achieving better experimental results than previously known methods and the training time is compressed by 15% of the baseline model.
Keywords/Search Tags:VQA, Visual Reasoning, Object Detection, Neural Network
PDF Full Text Request
Related items