Font Size: a A A

Research On VQA Task Based On Improved DMN+ Model

Posted on:2020-08-21Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiuFull Text:PDF
GTID:2518305897470694Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Visual Question Answering(VQA)answers the natural language problem of visual images.It is a cross-disciplinary study of natural language processing(NLP)and computer vision(CV).Since 2014,VQA tasks have been proposed and the release of the first data set,it has received more and more attention,which has promoted the development of artificial intelligence and deep learning to some extent.The VQA algorithm needs to reason the correct answer according to the visual elements of the image and the general knowledge.The current mainstream VQA algorithm mainly finds the image area related to the problem based on the attention mechanism,but the existing method is simple to represent the problem and the image.And the model stores limited memory during the reasoning process.In this thesis,the dynamic memory network model(DMN+)is improved for the above problems.Firstly,two new input modules are proposed for the image,namely the input module based on the feature fusion mode of the attention mechanism and the input module based on the Bottom-up feature extraction method.At the same time,the Transformer method is used to obtain the code representation of the problem.The performance of the DMN+model on the VQA dataset has improved significantly.Secondly,in order to accelerate the training of the model,the VQA data set is divided by the hierarchical sampling method,and the individual models are trained in parallel in each divided data set,and the final result is obtained by using the model fusion method.Experiments on the VQA dataset show that this parallel training approach not only reduces the experimental hardware conditions,but also increases the model's AccuracyVQA index by 45 points.Again,since the DMN+model uses multiple choices to generate answers,in order to balance the errors produced by each sample in the VQA data set,the samples are weighted according to the number of answer type samples.Experiments show that the weighted training method makes the model have11.5AccuracyVQA performance on the VQA dataset.Finally,in order to quickly converge to the optimal solution in the training process,and avoid the extreme phenomenon of learning rate in the late stage of model optimization,This thesis studies the effects of two learning rate control methods.The results on the VQA dataset show that the learning rate control strategy not only speeds up the convergence of the model,but also improves the model AccuracyVQA index by 12.
Keywords/Search Tags:Visual question and answer, attention mechanism, DMN+, Transformer, target recognition
PDF Full Text Request
Related items