Font Size: a A A

Research On Visual Question Answering Algorithm Based On Spatial Attention Reasoning Mechanism

Posted on:2021-05-28Degree:MasterType:Thesis
Country:ChinaCandidate:Z T LiFull Text:PDF
GTID:2518306119970719Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of deep learning,the field of artificial intelligence has ushered in a development boom.Deep learning has made significant breakthroughs in computer vision,natural language processing,speech processing and other fields.In recent years,multi-modal learning tasks covering computer vision and natural language processing have attracted more and more researchers' attention.Visual question answering is one of the multi-modal tasks.Visual question answering is designed to to automatically generate natural language answers to the questions raised based on the content of the picture.It also involves multi-modal information input(picture information and questions information).The key to the visual question answering task lies in the common understanding of computer vision and natural language,and the joint reasoning between computer vision and natural language.The visual question answering model contains four modules: image,question feature extraction module,feature fusion module,multi-modal information processing module and answer generation module.At present,the main research directions are focused on feature extraction module and multi-modal information processing module.The main innovations and research work of this article are as follows:(1)Due to the large amount of information contained in the image and the problem,this paper uses the current advanced object detection technology to extract features,and fully detect the content information in the image.The object information has not only high-level semantic information but also a large number of details.Similarly,this paper uses Gated recurrent neural network to extract problem features,and the obtained text features which can accurately express the problem.(2)In order to obtain perfect feature,the self-attention module is used to self-attention module is design to obtain the self-attention feature of the image and the question text.The self-attention mechanism can effectively reduce noise and redundant information,and obtain a more accurate feature expression about itself.(3)This paper presents a spatial reasoning attention module by researching the method of attention mechanism to deal with multi-modal features,and in view of the current insufficient attention mechanism for image and problem features.Multi-modal feature fusion and multi-modal feature reasoning are performed on image features and question text features.The image features and the question features are fused to obtain a unified feature expression about the image and the text,and attention reasoning is performed on the fused features to obtain the fusion feature with spatial reasoning expression.This paper integrates a variety of different feature extraction techniques,improves the self-attention module,and proposes a spatial reasoning attention module to improve the visual question answering model.This model conducts experimental tests on visual question answering data sets.The simulation results show that the same type of this visual question answering model has a significant improvement in algorithm accuracy.
Keywords/Search Tags:visual question answering, feature extraction, multi-modal features, self-attention, spatial reasoning attention
PDF Full Text Request
Related items