Studies On Video Modeling And Action Recognition Based On Recurrent Neural Networks

Posted on:2019-07-07

Degree:Doctor

Type:Dissertation

Country:China

Candidate:W B Du

Full Text:PDF

GTID:1368330566959283

Subject:Pattern Recognition and Intelligent Systems

Abstract/Summary:

PDF Full Text Request

Recently,action modeling and recognition in videos has always been one hot and difficult topic.Studies demonstrate the success of Recurrent Neural Networks(RNNs)for tasks such as machine translation.However,previous works utilizing high level fea-ture as inputs to train RNNs didn't achieve excellent results as expected.This is mainly due to the fact that different from other sequence data,video itself has its characteristic:First,video data are high-dimensional with complex structure.For action recognition,certain actions are ambiguous within single frame and it needs contextual information to recognize certain actions.Second,single frame's information often lacks certain struc-ture,and adjacent frames share much redundance.These feature of video data increases the difficulty of modeling by recurrent neural networks.Based on the analysis of this problem,we designed recurrent spatial-temporal attention network and recurrent pose attention networks to deal with action recognition in videos.We evaluate our methods on popular benchmarks and experimental results show the effectiveness of our methods.In the first part of this thesis work,we propose recurrent spatial-temporal attention network.our module can automatically learn a spatial-temporal action representation from all sampled video frames,which is compact and highly relevant to the prediction at the current step.We design an attention-driven appearance-motion fusion strategy to integrate appearance and motion LSTMs into a unified framework.We evaluate the proposed method on the benchmark UCF101,HMDB51,and JHMDB data sets.The experimental results show that,our method outperforms other recent RNN-based ap-proaches on UCF101 and HMDB51 as well as achieves the state-of-the-art on JHMDB.In the second part of this thesis work,we propose recurrent pose attention net-work.our pose-attention mechanism learns robust pose features by guide the attention mechanism using pose information,thus making the information of single frame more structural for better modeling.One important byproduct of our work is pose estimation in videos.We evaluate the proposed method quantitatively and qualitatively on two popular benchmarks,i.e.,Sub-JHMDB and PennAction.Experimental results show that our method outperforms the recent state-of-the-art methods on these challenging datasets.This part of work has been published in Proceddings of IEEE International Conference on Computer Vision(ICCV)2017 as oral report.

Keywords/Search Tags:

Recurrent Neural Networks, Video Modeling, Action Recognition

PDF Full Text Request

Related items

1	Research On Action Recognitions Based On Spatio-temporal Context Modeling
2	Research On Video Acticon Recognition Based On Video Content
3	Research On Video Action Recognition Based On Deep Learning
4	Video Action Recognition And Analysis Based On Deep Learning
5	Video Action Recognition With Recurrent Convolutional Neural Networks
6	Research On 3D Data Based Human Action Recognition
7	Research On Video Action Recognition Based On Deep Learning
8	Research On Some Problems Of Image Sequence Recognition Based On Recurrent Neural Networks
9	Research On Human Action Recognition Based On Depth Sequential Features
10	Research On 3D Skeleton-Based Human Action Recognition With Deep Learning