Font Size: a A A

Research And Implementation Of Visual Question Answering System Based On Collaborative Attention Mechanism

Posted on:2021-01-07Degree:MasterType:Thesis
Country:ChinaCandidate:M Y GeFull Text:PDF
GTID:2438330626964282Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The visual question and answering(VQA)system is a task that takes pictures and questions as input and the computer combines the input image and text information to produce a human language as output.It uses both computer vision and natural language processing.In question and answering,computer vision technology is used to understand input images,and NLP technology is used to understand input questions and generate answers.The key solution of VQA lies in the integration of visual and linguistic features extracted from input images and questions.In recent years,many networks based on CNN+LSTM can show good results,and recently many networks have applied attention to VQA.Still,the accuracy of VQA in answering questions is not ideal,especially when it comes to relational reasoning and counting.In order to solve this problem,this study chooses the training method of collaborative attention mechanism to train the network,in which the function of collaborative attention mechanism is to generate the correlation features pair of image-problem pairs,and used RN to infer the relationship between objects in the picture and the relationship between objects and questions in the picture to help the model predict the answer.This article focuses on the question answering system based on collaborative attention mechanism of vision,the main research contents include: 1)the further study of the collaborative attention mechanism,build an effective coordination mechanism,attention by co-attention synergy mechanism to generate images-attention problem of double related characteristics,can let network autonomous learning double related characteristics,through the experiment,make visual visual question and answering accuracy improved.2)aiming at the problems with low accuracy of visual visual question and answering on complex problems,an inference network(RN)module was constructed to further improve the reasoning of the model through RN so that the model could extract relevant features of complex problems.3)The features of RN network are input into co-attention to extract the correlation feature pairs to help the model predict the answers and improve the accuracy of the system in answering complex questions such as relational inference.
Keywords/Search Tags:Visual Questions Answers, Cooperate, Attention Mechanism, Relational Network, Natural Language Processing
PDF Full Text Request
Related items