Action Recognition And Localization Based On Deep Learning

Posted on:2021-01-30

Degree:Master

Type:Thesis

Country:China

Candidate:Y S Zhu

Full Text:PDF

GTID:2428330647952410

Subject:Control Engineering

Abstract/Summary:

PDF Full Text Request

With the unprecedented development of deep learning in image domain,people are increasingly focusing on videos.Video understanding has become an important topic in artificial intelligence and computer vision,which mainly covers two major tasks: trimmed action recognition and untrimmed temporal action localization.Trimmed action recognition refers to determining if a trimmed video belongs to a specific predefined action category.Untrimmed temporal action localization refers to locating the action instance and finding its temporal boundaries.These two tasks have a widespread application prospect in the fields of intelligent monitoring,medical health,autonomous driving and robotics.Although the great process has been made,it still suffers from multiple problems such as viewpoints change and spatio-temporal modeling.Firstly,we may face many questions such as camera motion,viewpoints change,noisy backgrounds and diverse temporal gap when we take a video,so it is hard to locate the action instance precisely.Secondly,considering action as a process of motion evolution in the temporal dimension,temporal information is of vital importance in action recognition and localization,how to make full use of temporal context and how to explore the spatio-temporal relations deserve to be further studied.Aiming at solving these problems,we propose two methods and the main contributions are as follows:We propose a multi-view attention algorithm in trimmed action recognition.By introducing our multi-view attention mechanism into the basic 3D convolutional neural networks,our new model can learn fine-grained spatio-temporal information adaptively.We also use optical flow and two-stream method to further improve the accuracy of recognition.We validate the effectiveness of our method on two benchmark datasets.We propose a collaborative local-global learning algorithm in untrimmed temporal action localization.Early methods focus on local features,while the time span of different actions are diverse,fixed geometric structures may not be able to capture the diverse global information.So,we propose a collaborative local-global learning algorithm to combine local and global features.On top of the diverse temporal gap,the other factors like viewpoints change,camera motion also negatively influence the temporal boundary location performance.To solve this problem,we introduce an attention mechanism.Network armed with attention mechanism has the ability to focus on actions.We validate the effectiveness of our method on two benchmark datasets.

Keywords/Search Tags:

Deep learning, Action localization, Action recognition, Attention mechanism, Spatio-temporal modeling

PDF Full Text Request

Related items

1	Video Action Recognition Based On 2D Convolution Network Under Spatio-Temporal Feature Enhancement Mechanism
2	Research On Action Recognition Based On Deep Network Learning Of Spatio-temporal Features
3	Research On Human Skeleton Action Recognition Method Based On Graph Convolutional Network
4	Action Recognition And Temporal Action Localization Based On Attention Mechanism
5	Temporal Action Localization And Action Recognition Based On Deep Learning
6	Research Of Temporal Action Localization Algorithm Based On Weakly-Supervised Deep Learning
7	Research On Human Action Recognition Method Based On Deep Learning
8	Research On Video Action Recognition Method Based On Spatio-temporal Feature Modeling
9	Research On Temporal Action Detection And Action Recognition Based On Deep Learning
10	Research On Spatio-Temporal Action Detection Based On Self-Attention