Font Size: a A A

Deep-learning-based Action Recognition

Posted on:2022-06-30Degree:MasterType:Thesis
Country:ChinaCandidate:L Y HuFull Text:PDF
GTID:2518306509484484Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Recently,due to the rapid development of video platforms,consistent update of mobile cameras and wide usage of mobile communication,the number of videos has shown an explosive growth.Therefore,how to understand videos has become a hot topic.Action recognition is a fundamental and critical question in video understanding.In early years,researchers usually used hand-craft features to capture spatial-temporal features for classification.Nowadays,deep learning has made much progress in action recognition thanks to its powerful ability in feature extraction and fitting large-scale data.However,there are still several problems for existing deep-learning-based action recognition methods.First,it's difficult for existing methods to recognize complex and rapidly changing actions in RGB videos.Here are two reasons.First,videos containing complex actions are often composed of a large number of backgrounds and many unrelated people,which easily distract existing methods and make them hard to locate the actor.Second,complex actions often require models to combine the details around human body to recognize actions,rather than just relying on holistic features.However,most methods based on convolutional neural networks often fail to establish visual relationships between different parts in images.We thus propose a novel human-focused approach to establish visual relationships in images which outperforms mainstream approaches on two large public datasets.Second,due to its compactness and independence of background and irrelevant factors,skeleton data has good applications in high-performance action recognition.However,there are still many drawbacks for existing methods in modeling node relationships as well as updating node features.Earlier methods directly reshape skeleton data and extract features consequently,ignoring the internal relationships between nodes.Then,graph convolutional networks regard joints as graph nodes and use fixed adjacent matrix to model the relationships.Some methods introduce self-attention mechanism to adaptively model the relationships and expand the spatial receptive field.However,beneficial information gets lost after many layers.We propose a spatial-temporal graph attention network to solve this problem,which surpasses mainstream methods on three large public datasets.
Keywords/Search Tags:Action Recognition, Visual Relationship, Self-Attention Mechanism
PDF Full Text Request
Related items