Font Size: a A A

Fine-grained Visual Question Answering Based On Deep Learning

Posted on:2020-10-25Degree:MasterType:Thesis
Country:ChinaCandidate:L WangFull Text:PDF
GTID:2428330605467984Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the increasing enrichment of application scenarios,the visual question answering(VQA)task as an interdisciplinary research has become a hot research issue in the cross-media field.This task is designed to allow the machine to automatically give correct answers based on the images and the text of the questions.In recent years,a lot of models using deep learning have emerged in the field of VQA.These algorithm models use visual mechanisms getting interactions between images and questions to obtain the semantic information of fine-grained images and better performance.Based on the classical VQA algorithms,this paper explores VQA models under different visual attention mechanisms and enhances the ability to understand the fine-grained semantics of images to further improve the accuracy of the fine-grained VQA model.The main works of this dissertation are introduced as follows.1.A VQA model based on the cross-channel attention mechanism is proposed.The model models the interdependence between feature channels,adaptively adjust the feature response between channels,and obtains the focus of visual question answering that is subjectively understood.The experimental results show that the cross-channel attention module proposed in this paper effectively improves the model's fine-grained understanding ability,and has low computational complexity,small parameter and easy to embed in the existing VQA model framework.2.A fine-grained VQA model based on the adaptive attention module of region proposal geometric feature enhancement is proposed.The model enhances the ability to correct attention weights by modeling the geometric features of the physical locations within the image.The experimental results show that the adaptive attention module based on the geometric feature enhancement of the region proposal can more intuitively correct the attention weight,and the result of the fine-grained VQA model is significantly improved,achieving higher accuracy.
Keywords/Search Tags:Visual Question Answering, Cross-media, Deep Learning, Attention Mechanism, Geometric Feature
PDF Full Text Request
Related items