Research And Implementation Of Visual Question Answering System Based On Deep Learning

Posted on:2022-06-12

Degree:Master

Type:Thesis

Country:China

Candidate:A Chang

Full Text:PDF

GTID:2518306494992099

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

The development of neural networks and the increase of large-scale data sets,as well as the improvement of computer hardware computing capabilities,have made deep learning-based technologies in unimodal(image,text,voice)huge development and application.However,human advanced cognitive and reasoning functions such as multimodal understanding and interaction are still weak.In response to this problem,this article studies an extremely important research topic in the field of multimodal interaction-visual question answering(VAQ).Visual question answering involves two modal information of image and text.Due to the outstanding performance of convolutional neural network(CNN)and recurrent neural network(RNN)on images and text respectively,many models based on the combination of convolutional network and recurrent network Good results are reflected in the visual question and answer task.With the emergence of the attention mechanism,many models based on the attention mechanism have been produced,but the existing models are still not particularly ideal in terms of overall accuracy,especially in answering complex questions that require reasoning and counting.Aiming at the problem that the overall accuracy of the current model is not high,this paper proposes a visual question answering model based on layered joint attention mechanism.Then,based on the existing models to answer complex questions with low accuracy,the idea of visual reasoning was studied and a visual question answering model based on reasoning network was proposed.Experimental results show that the accuracy of the model with reasoning ability to answer complex questions is significantly higher than other existing methods.This paper focuses on the visual question answering model based on deep learning,and uses deep learning methods to process visual question answering tasks.The main research contents include: 1)In-depth study of the attention mechanism and constructing a layered joint attention that can focus on images and questions in both directions The force model uses hierarchical attention to extract the problem features multiple times,and then uses joint attention to construct the image-problem feature map to enhance the relationship between the problem and the image.Experiments show that the hierarchical joint attention model can improve the image-problem Interrelationships to improve the accuracy of results.2)Aiming at the problem that the model is not highly accurate on complex problems,building a reasoning network module based on visual reasoning allows the model to extract complex problem features to improve the model’s reasoning ability.3)Use Res Net-152 to extract deep image features,build visual text joint memory memory vector under the action of question attention and visual attention,and help the model infer and predict the answer.After experimentation,the complex is based on the original data set.The prediction of the outcome of the problem has achieved good results.

Keywords/Search Tags:

Visual question and answer, Deep learning, Visual reasoning, Attention Mechanism, Inference network

PDF Full Text Request

Related items

1	Research On Situational Reasoning Question Answer Method Based On Deep Learning
2	Research On Visual Question Answer Algorithm Based On Attention Mechanism
3	Research On Visual Question Answering Models Based On Top-down Attention
4	Research On Collaborative Attention Model And Deep Correlated Networks For Visual Question Answer
5	Visual Question Answering Based On Object Relationship Modeling And Attention Mechanisms
6	Research On Visual Question And Answer Method Based On Supervised Learning
7	Research On Visual Question Answering Based On Deep Neural Network
8	Research On Visual Information Enhancement For Visual Question Answering
9	Research On Visual Question Answering Method With Attention Reasoning Mechanism
10	Visual Question Answering Based On Deep Reasoning