Research On Visual Question Answering Based On Multiple Attention Mechanism And Feature Fusion Algorithm

Posted on:2021-01-07

Degree:Master

Type:Thesis

Country:China

Candidate:S T Zhou

Full Text:PDF

GTID:2428330614458480

Subject:Control Science and Engineering

Abstract/Summary:

PDF Full Text Request

The visual question answering task is a frontier direction that combines computer vision research and natural language processing.The visual question answering system can find useful information from the images matching the question according to the semantics of the question to predict the answer to the question.The visual question answering task model includes four modules: image feature processing,text feature processing,multi-modal feature fusion and answer prediction.Among them,image feature processing and text feature processing belong to the category of feature extraction.In the current visual question answering research,how to perform feature extraction,multi-modal feature fusion and improvement of attention mechanism have always been the difficult problems of research,so this article will explore and study these three problems:1.Image preprocessing model based on Faster-RCNN target detection algorithm.In this thesis,Faster-RCNN and Resnet101 are combined to process image information.Faster-RCNN is used to identify object instances belonging to the class and use bounding boxes to locate them.The Resnet101 model preprocesses the VQA v2 data set and extracts 2048 Dimensional image feature vectors and image feature information participate in the training of visual question answering models in the form of matrix vector files.2.Research on visual question answering model based on multi-modal feature fusion.In order to solve the problem of cross-modal feature fusion,based on the working basis of 1,this thesis uses pre-trained word vector tools and long-term and short-term memory networks to characterize the text features,forming a 2048-dimensional feature vector to represent the problem.Then the 2048-dimensional image feature vector and the 2048-dimensional problem feature vector are input into the multimodal decomposition bilinear pooling feature fusion algorithm module to generate fusion features.Finally,the answer prediction module uses Soft Max as the classifier for answer prediction output.The experimental results on the VQA v2 data set prove that the visual question answering model constructed in this thesis is reasonable and scientific.3.Research on visual question answering model based on multiple attention mechanism of multi-modal feature fusion.In order to strengthen the semantic information of the model and capture more accurate image feature information,this thesis adds a self-attention mechanism,a guided attention mechanism and a multi-head attention mechanism on the basis of the work based on 2,to form a visual question answering model based on the multiple attention mechanism.It aims to better capture the relevant semantic information between pictures and text,and shorten the gap of multi-modal feature fusion.The experimental results show that the visual question answering model combined with the multi-attention mechanism and the multi-modal decomposition bilinear pooling feature fusion algorithm has higher accuracy and is superior to the advanced model.

Keywords/Search Tags:

visual question answering, target detection algorithm, multi-modal feature fusion, multiple attention mechanism

PDF Full Text Request

Related items

1	Research And Algorithm Implementation Of Efficient Visual Question Answering Based On Deep Learning
2	Research On Visual Question Answering Algorithm Based On Feature Fusion Of Attention Mechanism
3	Research On Visual Question Answering Algorithm Based On Spatial Attention Reasoning Mechanism
4	Research On Visual Question Answering Method And System Based On Deep Learning
5	Research On Visual Question Answering Based On Text Semantic Understanding
6	Research On Visual Question Answering Based On Deep Learning
7	Research On Visual Question Answering System Based On Image Attention
8	Research On Visual Question Answering Based On Deep Neural Network
9	A Research Of Video Question Answering Based On Deep Learning
10	Research On Visual Question Answer Algorithm Based On Attention Mechanism