Font Size: a A A

Action Recognition Based On Two Stream Spatial-Temporal Attention Network

Posted on:2021-04-10Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y LianFull Text:PDF
GTID:2518306047484114Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Human action recognition in videos plays an important role in computer vision,which has broad application prospects as well as potential economic and social values,and has attracted the attention of research institutions and researchers around the world.The generalization ability of previous hand-crafted features is not strong,and the method of action recognition based on hand-crafted features has great limitations.In recent years,with the development of deep learning,action recognition based on deep learning has been widely concerned.However,action recognition is still a challenging problem.This thesis summarizes and analyzes the existing action recognition methods,and gives the following contributions:Firstly,this thesis constructes a pyramid spatial-channel attention module.Among it,the spatial interactive attention sub-module considers the interaction between spatial positions,and obtains attention through interaction,which can aquire more accurate feature representation;the channel attention sub-module further improves the feature representation of semantics by digging the different importance of different channelsfurthermore,the features are described from multiple scales to make them more complete by introducing a pyramid structure.Secondly,feature mapping and temporal interactive attention module are constructed.Feature mapping can maintain the time order information of dense frames,which is simple and effective.Temporal interactive attention module can explore the interaction between different video frames,and further obtain time attention,modeling the interdependence between video frames.Thirdly,a static-motion feature collaborative model is established,which is optimized by alternative training scheme.This model can collaboratively explore the discriminative static information and motion information,and learn the strong complementarity of discriminative static features and motion features,so that the mutual guidance between static and motion information can enhance feature learning.After the static features and motion features are optimized by the static-motion feature collaborative model,the prediction scores of static flow and motion flow are obtained.Because the contributions of static information and motion information to different semantic classes are different,a two-stream multi-class adaptive fusion method is proposed,which can not only learn the fusion weights of static flow and motion flow adaptively,but also use the relationship among classes to further improve the network performance.All the methods in this thesis are tested on UCF101 and HMDB51 datasets,and compared with the state-of-the-art methods.The experimental results show that all the methods proposed in this thesis are effective and advanced.Finally,all the methods proposed in this thesis are summarized and the development trend and direction of action recognition are given.
Keywords/Search Tags:Action Recognition, Deep Learning, Attention Mechanism, Collaborative Optimization, Adaptive Fusion
PDF Full Text Request
Related items