Font Size: a A A

Research On Visual Question Answering Model Based On Attention Mechanism And Feature Fusio

Posted on:2024-01-21Degree:MasterType:Thesis
Country:ChinaCandidate:K LiFull Text:PDF
GTID:2568307130458924Subject:Electronic information
Abstract/Summary:PDF Full Text Request
Visual question answering(VQA)is one of the hot research directions in the field of artificial intelligence in recent years.With the continuous development of computer vision and natural language processing,the research on visual question answering has made great progress and has high theoretical research significance and practical value.Compared with the traditional question answering system,the VQA system not only needs to deal with the question in the form of text,but also needs to establish a connection with the corresponding image content to answer,and there is a natural semantic gap between the question and the image.A basic idea to solve this fusion gap is to carry out research by acquiring the spatial semantic relevance between the image and the question.Aiming at the difficult problems in the current visual question answering system,the main research work completed in this thesis is:1.An MSP network is proposed.MSP network is used to process image features in visual Q&A tasks;MSP takes Mobile Net V3 as the core to serve as the backbone network,and embeds the spatial pyramid pooling structure into the backbone network to improve the accuracy of model prediction.Compared with Faster R-CNN,MSP network is not only universal and robust,but also reduces the calculation of the system and ensures the accuracy of model recognition.2.The common output classifier is improved.This paper improves the structure of the output classifier.In order to better integrate the features with inconsistent semantics and scales,the feature fusion method in the classifier is improved from additive to AFF mode fusion,so that the full-connection layer of the classifier can further overcome the problem of feature confusion when acquiring deep targets,and strengthen the ability of the model to fuse deep target features.3.In order to better overcome the language priori problem of VQA,a visual branching model is proposed.By adding a visual component to further balance the relationship between language and vision,and enhancing the role of vision in VQA tasks,the overall model of this article can improve the output results in VQA tasks.This method will further improve the visual content of the question answering system to enhance the impact of visual content on answers.
Keywords/Search Tags:Visual question answering system, Deep learning, Attention mechanism, Feature fusion
PDF Full Text Request
Related items