Research And Implementation Of Visual Question Answering System Based On Collaborative Attention Mechanism

Posted on:2021-01-07

Degree:Master

Type:Thesis

Country:China

Candidate:M Y Ge

Full Text:PDF

GTID:2438330626964282

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

The visual question and answering(VQA)system is a task that takes pictures and questions as input and the computer combines the input image and text information to produce a human language as output.It uses both computer vision and natural language processing.In question and answering,computer vision technology is used to understand input images,and NLP technology is used to understand input questions and generate answers.The key solution of VQA lies in the integration of visual and linguistic features extracted from input images and questions.In recent years,many networks based on CNN+LSTM can show good results,and recently many networks have applied attention to VQA.Still,the accuracy of VQA in answering questions is not ideal,especially when it comes to relational reasoning and counting.In order to solve this problem,this study chooses the training method of collaborative attention mechanism to train the network,in which the function of collaborative attention mechanism is to generate the correlation features pair of image-problem pairs,and used RN to infer the relationship between objects in the picture and the relationship between objects and questions in the picture to help the model predict the answer.This article focuses on the question answering system based on collaborative attention mechanism of vision,the main research contents include: 1)the further study of the collaborative attention mechanism,build an effective coordination mechanism,attention by co-attention synergy mechanism to generate images-attention problem of double related characteristics,can let network autonomous learning double related characteristics,through the experiment,make visual visual question and answering accuracy improved.2)aiming at the problems with low accuracy of visual visual question and answering on complex problems,an inference network(RN)module was constructed to further improve the reasoning of the model through RN so that the model could extract relevant features of complex problems.3)The features of RN network are input into co-attention to extract the correlation feature pairs to help the model predict the answers and improve the accuracy of the system in answering complex questions such as relational inference.

Keywords/Search Tags:

Visual Questions Answers, Cooperate, Attention Mechanism, Relational Network, Natural Language Processing

PDF Full Text Request

Related items

1	Research And Implementation Of Natural Language Processing Algorithm For Reasoning
2	Research On Visual Question Answering Method Based On Attention Mechanism
3	Research On Image Description Generation Based On Visual Attention
4	Research On Text Classification Method Combining Attention Mechanism And Bi-GRU
5	Financial Market Trend Forecast Based On Deep Learning And Natural Language Processing
6	Research On Visual Question Answering Method With Visual Content Understanding And Text Information Analysis
7	Research On Collaborative Attention Model And Deep Correlated Networks For Visual Question Answer
8	Question Answering Model Based On Self-Attention Mechanism
9	Research On Recommendation Models Based On Multi-Granular Attention Networks
10	Natural Answer Generation With Attention Over Multi-instances