Font Size: a A A

Multi-Grained Hierarchical Attentional Recurrent Network For Video Question Answering

Posted on:2019-12-28Degree:MasterType:Thesis
Country:ChinaCandidate:J H LinFull Text:PDF
GTID:2428330548977418Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Video is one type of multimedia which carries massive information,thus it becomes a challenging and meaningful problem for computers to understand the content of videos fast and accurately.This work focuses on the video question answering task,which requires choosing the most accurate answer given a video and a question.This task is easy to verify the performance,which offers a chance to better explore solutions of understanding video content.Most of existing methods are based on static image features and utilizing simple models.However,these methods cannot avoid two problems.First,they may not learn the continuity of video frames well,by just using static features as input sequentially.Second,they may lose important information during the learning process when the input sequence is long,with just simple recurrent neural networks.To tackle with these two problems,this work uses dynamic video features learned by several continuous video frames,and designs a multiple-level attention neural network.This design can focus on multiple granularities of the question in the learning process,and capture more complete information of videos to reserve the best answer.With this method,we obtain the best performance comparing to all known methods on two reliable datasets.Furthermore,we verified its practica-bility by looking into detailed parameters of our neural network.
Keywords/Search Tags:Video Question Answering, Dynamic Video Feature, Attention Network, Recurrent Neural Network
PDF Full Text Request
Related items