Video Question Answering Based On Deep Attention And Deep Fusion

Posted on:2021-02-13

Degree:Master

Type:Thesis

Country:China

Candidate:M Zhang

Full Text:PDF

GTID:2428330623469128

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Nowadays,data increasingly appear in an unstructured form.Video has become the main carrier of information.It is very challenging to automatically analyze massive videos and obtain useful information from them.Video question answering,as one of the most measurable directions for visual semantic understanding,has received widespread attention from researchers.Its goal is to understand the current video and give reasonable answers given the natural language question.It undoubtedly can offer tremendous help for visual semantic understanding of massive videos.The current mainstream methods for video question answering use deep neural networks.The basic components include convolutional neural network,recurrent neural network,and attention mechanism.However,the existing models cannot make full use of text information,and their attention mechanism is more inclined to video features,and the fusion mechanism cannot fully fuse the multimodal features of text and video.This thesis proposes a video question answering model based on deep attention and deep fusion.Deep attention is a new method of attention calculation,which can effectively construct a frame-word-level attention map of multiple glimpses,so as to efficiently obtain the correlation weight between frames and words,and the number of parameters required to build an attention map decrease a lot.Deep fusion is a multi-output model structure based on residual learning.It can more effectively use the attention map information of multiple glimpses,and introduces the Refine module to continuously optimize the fusion features to make its information more targeted.We did many comparative experiments on the three datasets to verify the effectiveness of the model.The experimental results show that the algorithm proposed in this thesis can better solve the problem of video and text fusion,and the effect is significantly improved on three challenging datasets,thus proving the effectiveness of the algorithm.

Keywords/Search Tags:

Video Question Answering, Attention Mechanism, CNN, RNN

PDF Full Text Request

Related items

1	Spatio-Temporal Attention Networks For Video Question Answering
2	Research And Implementation Of Video Question Answering With Multimodal Data
3	Research On The Factoid Question Answering Based On Attention Pooling Mechanism And External Knowledge
4	Research On Deep Learning Algorithm For Automatic Question Answering
5	Research And Application Of Question Answering System Based On Attention Mechanism Semantic Matching
6	Video Question Answering Based On Deep Memory Fusion Method
7	Video Question Answering Based On Deep Attention And Deep Fusion
8	Object-oriented Two-Stream Network And Heterogeneous Graph Reasoning On Video Question Answering
9	Research On Algorithms Combining Attention Mechanism And Gate Mechanism In Question Answering System
10	Research On Deep Learning-based Multi-document Passage Ranking Methods For Question Answering System