Research On VQA Task Based On Improved DMN+ Model

Posted on:2020-08-21

Degree:Master

Type:Thesis

Country:China

Candidate:Y Liu

Full Text:PDF

GTID:2518305897470694

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Visual Question Answering(VQA)answers the natural language problem of visual images.It is a cross-disciplinary study of natural language processing(NLP)and computer vision(CV).Since 2014,VQA tasks have been proposed and the release of the first data set,it has received more and more attention,which has promoted the development of artificial intelligence and deep learning to some extent.The VQA algorithm needs to reason the correct answer according to the visual elements of the image and the general knowledge.The current mainstream VQA algorithm mainly finds the image area related to the problem based on the attention mechanism,but the existing method is simple to represent the problem and the image.And the model stores limited memory during the reasoning process.In this thesis,the dynamic memory network model(DMN+)is improved for the above problems.Firstly,two new input modules are proposed for the image,namely the input module based on the feature fusion mode of the attention mechanism and the input module based on the Bottom-up feature extraction method.At the same time,the Transformer method is used to obtain the code representation of the problem.The performance of the DMN+model on the VQA dataset has improved significantly.Secondly,in order to accelerate the training of the model,the VQA data set is divided by the hierarchical sampling method,and the individual models are trained in parallel in each divided data set,and the final result is obtained by using the model fusion method.Experiments on the VQA dataset show that this parallel training approach not only reduces the experimental hardware conditions,but also increases the model's Accuracy_VQA index by 4⁵ points.Again,since the DMN+model uses multiple choices to generate answers,in order to balance the errors produced by each sample in the VQA data set,the samples are weighted according to the number of answer type samples.Experiments show that the weighted training method makes the model have1¹.5Accuracy_VQA performance on the VQA dataset.Finally,in order to quickly converge to the optimal solution in the training process,and avoid the extreme phenomenon of learning rate in the late stage of model optimization,This thesis studies the effects of two learning rate control methods.The results on the VQA dataset show that the learning rate control strategy not only speeds up the convergence of the model,but also improves the model AccuracyVQA index by 1².

Keywords/Search Tags:

Visual question and answer, attention mechanism, DMN+, Transformer, target recognition

PDF Full Text Request

Related items

1	Self-attention Mechanism Based Answer Selection In Question Answering System
2	Research On Visual Question Answer Algorithm Based On Attention Mechanism
3	Research On Visual Question Answering Models Based On Top-down Attention
4	Research On Visual Question Answering Algorithm Based On Image Description And Multi-level Attention Mechanism
5	Research On Collaborative Attention Model And Deep Correlated Networks For Visual Question Answer
6	Research On Visual Question And Answer Method Based On Supervised Learning
7	Research On Visual Information Enhancement For Visual Question Answering
8	Research On Visual Question Answering Algorithm Based On Feature Fusion Of Attention Mechanism
9	Research And Implementation Of Visual Question Answering System Based On Deep Learning
10	Research On Question Answering System Based On Attention Mechanism And Answer Verification