Font Size: a A A

Studies On Video Modeling And Action Recognition Based On Recurrent Neural Networks

Posted on:2019-07-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:W B DuFull Text:PDF
GTID:1368330566959283Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Recently,action modeling and recognition in videos has always been one hot and difficult topic.Studies demonstrate the success of Recurrent Neural Networks(RNNs)for tasks such as machine translation.However,previous works utilizing high level fea-ture as inputs to train RNNs didn't achieve excellent results as expected.This is mainly due to the fact that different from other sequence data,video itself has its characteristic:First,video data are high-dimensional with complex structure.For action recognition,certain actions are ambiguous within single frame and it needs contextual information to recognize certain actions.Second,single frame's information often lacks certain struc-ture,and adjacent frames share much redundance.These feature of video data increases the difficulty of modeling by recurrent neural networks.Based on the analysis of this problem,we designed recurrent spatial-temporal attention network and recurrent pose attention networks to deal with action recognition in videos.We evaluate our methods on popular benchmarks and experimental results show the effectiveness of our methods.In the first part of this thesis work,we propose recurrent spatial-temporal attention network.our module can automatically learn a spatial-temporal action representation from all sampled video frames,which is compact and highly relevant to the prediction at the current step.We design an attention-driven appearance-motion fusion strategy to integrate appearance and motion LSTMs into a unified framework.We evaluate the proposed method on the benchmark UCF101,HMDB51,and JHMDB data sets.The experimental results show that,our method outperforms other recent RNN-based ap-proaches on UCF101 and HMDB51 as well as achieves the state-of-the-art on JHMDB.In the second part of this thesis work,we propose recurrent pose attention net-work.our pose-attention mechanism learns robust pose features by guide the attention mechanism using pose information,thus making the information of single frame more structural for better modeling.One important byproduct of our work is pose estimation in videos.We evaluate the proposed method quantitatively and qualitatively on two popular benchmarks,i.e.,Sub-JHMDB and PennAction.Experimental results show that our method outperforms the recent state-of-the-art methods on these challenging datasets.This part of work has been published in Proceddings of IEEE International Conference on Computer Vision(ICCV)2017 as oral report.
Keywords/Search Tags:Recurrent Neural Networks, Video Modeling, Action Recognition
PDF Full Text Request
Related items