Font Size: a A A

Research On Visual Question Answering Based On Attention Mechanism And Memory Network

Posted on:2021-02-13Degree:MasterType:Thesis
Country:ChinaCandidate:R Y YanFull Text:PDF
GTID:2428330614960407Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
In recent years,deep learning technology has become a research hotspot in the field of artificial intelligence.It has been widely used in computer vision and natural language processing tasks,and has achieved remarkable performance.With the deepening of research,people began to explore the expression and interaction of cross-media data,and visual question answering(VQA)is one of the hot research issues.VQA belongs to the intersection of computer vision and natural language processing.Its main purpose is to hope that the computer can generate correct answers based on images and image-based questions.Most of the existing VQA models use top-down attention mechanisms to process images,ignoring the complete expression of the image content and resulting in redundant image features.And most VQA models only use a single attention mechanism,so the noise contained in the image and the question cannot be effectively removed.In addition,owing to the lack of long-term memory modules,part of the effective information is lost during the reasoning of answers,which will affect the model's judgment on answers.Considering the above problems comprehensively,this thesis conducts research on the VQA system based on attention mechanisms and memory networks.To improve the accuracy of VQA by enhancing the effective representation and long-term memory of the image and the question.The main work of this thesis is as follows:(1)In this thesis,we propose a VQA model based on attention-gated memory network.We use image features from bottom-up attention,which is implemented by using the object detection model to extract objects and other salient regions in the image.On this basis,the attention-gated memory network is further combined.After several iterations,the memory network achieves long-term memory of effective information.Through relevant experiments on public datasets,the effectiveness of the VQA model is verified.(2)Based on the framework proposed in(1),we further propose an improved model based on multiple attention mechanisms.A bidirectional attention mechanism is proposed to deal with image features and question features,which implement the information interaction between the image and the question,and effectively remove the information irrelevant to question answering.The processed image features and question features are used as the input of memory network to achieve more accurate retrieval and memory of effective information.In addition,the object counting module based on the attention mechanism is introduced into the model to solve the problem of low accuracy in answering counting questions.A series of comparative experiments on public datasets verified the good performance of the improved model.
Keywords/Search Tags:visual question answering, attention mechanism, memory network, bottom-up, bidirectional attention
PDF Full Text Request
Related items