Action Recognition Based On Two Stream Spatial-Temporal Attention Network

Posted on:2021-04-10

Degree:Master

Type:Thesis

Country:China

Candidate:Z Y Lian

Full Text:PDF

GTID:2518306047484114

Subject:Master of Engineering

Abstract/Summary:

PDF Full Text Request

Human action recognition in videos plays an important role in computer vision,which has broad application prospects as well as potential economic and social values,and has attracted the attention of research institutions and researchers around the world.The generalization ability of previous hand-crafted features is not strong,and the method of action recognition based on hand-crafted features has great limitations.In recent years,with the development of deep learning,action recognition based on deep learning has been widely concerned.However,action recognition is still a challenging problem.This thesis summarizes and analyzes the existing action recognition methods,and gives the following contributions:Firstly,this thesis constructes a pyramid spatial-channel attention module.Among it,the spatial interactive attention sub-module considers the interaction between spatial positions,and obtains attention through interaction,which can aquire more accurate feature representation;the channel attention sub-module further improves the feature representation of semantics by digging the different importance of different channelsfurthermore,the features are described from multiple scales to make them more complete by introducing a pyramid structure.Secondly,feature mapping and temporal interactive attention module are constructed.Feature mapping can maintain the time order information of dense frames,which is simple and effective.Temporal interactive attention module can explore the interaction between different video frames,and further obtain time attention,modeling the interdependence between video frames.Thirdly,a static-motion feature collaborative model is established,which is optimized by alternative training scheme.This model can collaboratively explore the discriminative static information and motion information,and learn the strong complementarity of discriminative static features and motion features,so that the mutual guidance between static and motion information can enhance feature learning.After the static features and motion features are optimized by the static-motion feature collaborative model,the prediction scores of static flow and motion flow are obtained.Because the contributions of static information and motion information to different semantic classes are different,a two-stream multi-class adaptive fusion method is proposed,which can not only learn the fusion weights of static flow and motion flow adaptively,but also use the relationship among classes to further improve the network performance.All the methods in this thesis are tested on UCF101 and HMDB51 datasets,and compared with the state-of-the-art methods.The experimental results show that all the methods proposed in this thesis are effective and advanced.Finally,all the methods proposed in this thesis are summarized and the development trend and direction of action recognition are given.

Keywords/Search Tags:

Action Recognition, Deep Learning, Attention Mechanism, Collaborative Optimization, Adaptive Fusion

PDF Full Text Request

Related items

1	Research On Human Action Recognition Method Based On Deep Learning
2	Human Action Recognition Based On Attention Mechanism And Multi-Modality Feature Fusion
3	Human Skeleton-based Action Recognition Based On Deep Learning
4	Research And Implementation Of Key Techniques For Human Action Recognition Based On Deep Learning
5	Studies On Action Recognition In Video Based On Deep Learning
6	Research On Optimization Technology Of Human Action Recognition In Video
7	Deep Feature Fusion And Attention Models For Video Action Recognition
8	Research On Human Action Recognition Method Integrating Visual Attention Mechanism And Deep Learning
9	Research On Image-based Action Recognition Based On Context And Feature Fusion
10	Human Action Recognition Via Dual Spatio-temporal Network Flow And Attention Mechanism Fusion