Font Size: a A A

Multi-Turn Video Question Answering Via Attention Mechanism

Posted on:2020-10-27Degree:MasterType:Thesis
Country:ChinaCandidate:X H JiangFull Text:PDF
GTID:2428330572496874Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Video question answering is one of the important issues in the field of computer vision and natural language processing.The problem it solves is generating the accurate answer from the referenced video contents a.ccording t.o the visual conversation context and given question.At present,most of video question answering methods mainly tackle the problem of single-turn video question answering.These methods use neural network based on deep learning,including recurrent neural network,convolutional neural network and attention mechanism.However,the existing video question answeing methods mainly solve the single-turn video question answeing.Due to insufficient modeling of the sequence dialogue context,it may not be directly applied to multi-turn video question answering.We propose a new multi-turn video question answering model.We first propose a question-understanding question embedded representation structure by modeling the con-text structure of the hierarchical sequence dialogue.Then,we develop a multi-stream spatio-temporal awareness network for learning the joint representation of dynamic video content and context-aware problem embedding.Next,we use a multi-level multi-step reasoning at-tention network for multi-turn video question answering.We construct two large-scale multi-turn video question answering datasets.The empirical study on datasets show that the proposed algorithms are more effective.
Keywords/Search Tags:video question answering, multi-turn, attention mechanism
PDF Full Text Request
Related items