Font Size: a A A

Research On Visual Question And Answer Method Based On Supervised Learning

Posted on:2021-07-15Degree:MasterType:Thesis
Country:ChinaCandidate:B X YanFull Text:PDF
GTID:2518306047482094Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Visual question answering is a hot research topic in deep learning.It requires the combination of computer vision and natural language processing,which is a very challenging task.The input of the visual question answering model is an image and a question related to the image.The model will answer according to the input image and the question.The existing visual question answering models are all focused on modeling images,failing to make full use of the semantic information of the question.In addition,these models often fail to give accurate answers when answering questions related to fine-grained regions.In view of the insufficient use of semantic information in the visual question answering model,we propose a double attention mechanism in this paper.Aiming at the defect that the visual question answering model answers questions related to fine granularity,the accuracy rate is low,we use a iterative method to reduce the granularity of the attention area and make the attention area more accurate in this paper.Based on the above two points,this paper proposes a visual question answering model based on attention mechanism-a dual attention mechanism model using a iterative method.The model includes five parts: the first part is the extraction of image features;the second part is the extraction of problem features;the third part is the construction of a dual attention mechanism framework;the fourth part is the iterative method;the fifth part is the answer prediction.Finally,in order to verify the validity of the model in this paper,this paper uses supervised learning to perform experiments on commonly used data sets,and compares the experimental results with those of advanced models.The experimental results show that the model in this paper can implement a two-way guidance mechanism of images and questions,and can make the area of interest more accurate,and can improve the accuracy of answering questions to a certain extent,which is feasible.
Keywords/Search Tags:supervised learning, neural network, visual question answer, attention mechanism
PDF Full Text Request
Related items