Research On Visual Question Answering Model Based On Attention Mechanism And Feature Fusio

Posted on:2024-01-21

Degree:Master

Type:Thesis

Country:China

Candidate:K Li

Full Text:PDF

GTID:2568307130458924

Subject:Electronic information

Abstract/Summary:

Visual question answering(VQA)is one of the hot research directions in the field of artificial intelligence in recent years.With the continuous development of computer vision and natural language processing,the research on visual question answering has made great progress and has high theoretical research significance and practical value.Compared with the traditional question answering system,the VQA system not only needs to deal with the question in the form of text,but also needs to establish a connection with the corresponding image content to answer,and there is a natural semantic gap between the question and the image.A basic idea to solve this fusion gap is to carry out research by acquiring the spatial semantic relevance between the image and the question.Aiming at the difficult problems in the current visual question answering system,the main research work completed in this thesis is:1.An MSP network is proposed.MSP network is used to process image features in visual Q&A tasks;MSP takes Mobile Net V3 as the core to serve as the backbone network,and embeds the spatial pyramid pooling structure into the backbone network to improve the accuracy of model prediction.Compared with Faster R-CNN,MSP network is not only universal and robust,but also reduces the calculation of the system and ensures the accuracy of model recognition.2.The common output classifier is improved.This paper improves the structure of the output classifier.In order to better integrate the features with inconsistent semantics and scales,the feature fusion method in the classifier is improved from additive to AFF mode fusion,so that the full-connection layer of the classifier can further overcome the problem of feature confusion when acquiring deep targets,and strengthen the ability of the model to fuse deep target features.3.In order to better overcome the language priori problem of VQA,a visual branching model is proposed.By adding a visual component to further balance the relationship between language and vision,and enhancing the role of vision in VQA tasks,the overall model of this article can improve the output results in VQA tasks.This method will further improve the visual content of the question answering system to enhance the impact of visual content on answers.

Keywords/Search Tags:

Visual question answering system, Deep learning, Attention mechanism, Feature fusion

Related items

1	Research And Algorithm Implementation Of Efficient Visual Question Answering Based On Deep Learning
2	Research On Visual Question Answering Method And System Based On Deep Learning
3	Research On Visual Question Answering Based On Deep Neural Network
4	Research On Collaborative Attention Model And Deep Correlated Networks For Visual Question Answer
5	Fine-grained Visual Question Answering Based On Deep Learning
6	Research On Visual Question Answering Algorithm Based On Feature Fusion Of Attention Mechanism
7	Research On Visual Question Answer Algorithm Based On Attention Mechanism
8	Research On Multimodal Attention Mechanism And Information Fusion For Visual Question Answering
9	Research On Visual Question Answering System Based On Image Attention
10	Research And Implementation Of Visual Question Answering Algorithm Based On Deep Attention Stacking