Font Size: a A A

Research On Visual Question Answering Method Based On Dynamic Memory Network

Posted on:2020-03-01Degree:MasterType:Thesis
Country:ChinaCandidate:R YangFull Text:PDF
GTID:2518306464487094Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As the deep learning method is continuously applied to the related work of computer vision.The computer's processing ability for images is greatly improved,and people's requirements and expectations for image understanding are also getting higher.In recent years,visual question answering(VQA)as a new challenge of image understanding tasks,has been widely concerned by researchers and has gradually become a research hotspot.The VQA model can collects image information by the way of asking questions,which means it could better understand the image.As the interdisciplinary research of computer vision and natural language processing,it uses the computer vision method to extract the feature information from the input image and applies the natural language processing method to analyze the semantics information of input question at the same time.Related research on VQA models has been rapidly developed in recent years,and various types of methods such as basic models,attention mechanism related models,modular models and models based on external knowledge bases have been proposed.Even the most advanced model still has many problems in dealing with actual tasks,especially in the face of complex structural input question or images.It could not get accurate answers due to the lack of analytical reasoning ability.Based on the comprehensive comparison,analysis and summary of the existing VQA models or methods,focusing the processing problem of question and image input of the VQA,we constructed a VQA model framework with complete reasoning process.The following work is carried out based on the research:(1)Based on the modular structure idea,an image repairing and processing method based on the generative translation method is proposed.By introducing a content generation module and a style transition module,the model can achieve uniformity and integrity in image repairing.The method gets a good experimental effect when applied to the repair of face images.(2)By augmenting the semantic information of the problem by iterative query,a logic analysis model based on query generation network is proposed.It could construct a reasoning process by obtaining evidence information through iterative knowledge queries.This method has achieved excellent results in all kinds of VQA analysis tasks;(3)Based on dynamic memory network,introducing an observation network based on query generation network and a knowledge source based on knowledge network,a VQA model framework with complete reasoning process is proposed.Our proposed VQA model utilizes a modular architecture to take full advantageof each module which means it could complete the processing and analysis of tasks quickly.At the same time,through the attention visual mapping and the iterative query method,the framework provides us with a partially interpretable reasoning process,and also provides an effective feedback mechanism for the optimization of the model.
Keywords/Search Tags:visual question answering, dynamic memory networks, computer vision, image understanding
PDF Full Text Request
Related items