Fine-grained Visual Question Answering Based On Deep Learning

Posted on:2020-10-25

Degree:Master

Type:Thesis

Country:China

Candidate:L Wang

Full Text:PDF

GTID:2428330605467984

Subject:Computer Science and Technology

Abstract/Summary:

With the increasing enrichment of application scenarios,the visual question answering(VQA)task as an interdisciplinary research has become a hot research issue in the cross-media field.This task is designed to allow the machine to automatically give correct answers based on the images and the text of the questions.In recent years,a lot of models using deep learning have emerged in the field of VQA.These algorithm models use visual mechanisms getting interactions between images and questions to obtain the semantic information of fine-grained images and better performance.Based on the classical VQA algorithms,this paper explores VQA models under different visual attention mechanisms and enhances the ability to understand the fine-grained semantics of images to further improve the accuracy of the fine-grained VQA model.The main works of this dissertation are introduced as follows.1.A VQA model based on the cross-channel attention mechanism is proposed.The model models the interdependence between feature channels,adaptively adjust the feature response between channels,and obtains the focus of visual question answering that is subjectively understood.The experimental results show that the cross-channel attention module proposed in this paper effectively improves the model's fine-grained understanding ability,and has low computational complexity,small parameter and easy to embed in the existing VQA model framework.2.A fine-grained VQA model based on the adaptive attention module of region proposal geometric feature enhancement is proposed.The model enhances the ability to correct attention weights by modeling the geometric features of the physical locations within the image.The experimental results show that the adaptive attention module based on the geometric feature enhancement of the region proposal can more intuitively correct the attention weight,and the result of the fine-grained VQA model is significantly improved,achieving higher accuracy.

Keywords/Search Tags:

Visual Question Answering, Cross-media, Deep Learning, Attention Mechanism, Geometric Feature

Related items

1	Research On Visual Question Answering Model Based On Attention Mechanism And Feature Fusio
2	Research Of Visual Question Answering Technique Based On Deep Learning
3	Research And Algorithm Implementation Of Efficient Visual Question Answering Based On Deep Learning
4	Research On Collaborative Attention Model And Deep Correlated Networks For Visual Question Answer
5	Research On Visual Question Answering Based On Deep Neural Network
6	Research And Implementation Of Visual Question Answering Algorithm Based On Deep Attention Stacking
7	Research On Visual Question Answering System Based On Image Attention
8	Visual Question Answering Based On Object Relationship Modeling And Attention Mechanisms
9	Research On Visual Question Answering Method And System Based On Deep Learning
10	Research And Application Of Visual Question And Answering Algorithm Based On Deep Learning