Font Size: a A A

Attention With Structure Regularization For Action Recognition

Posted on:2021-04-08Degree:MasterType:Thesis
Country:ChinaCandidate:Y X ChenFull Text:PDF
GTID:2428330611966945Subject:Computer Science and Technology
Abstract/Summary:
Human action recognition from videos,or say video action recognition,refers to the tasks of recognizing human's actions in videos.Human actions are rich,diverse,and everywhere in social activities and real life.Identifying human actions in videos can effectively improve the understanding of video content.The high-level semantic information provided by the action recognition results can be further applied to many applications such as video surveillance,intelligent motion,and behavioral warning,which is of great value.In videos,the key parts for action recognition include the people and objects in motion.In most cases,the amount of the key information only accounts for a small part of the video frames.By highlighting the key information of the action and ignoring other unrelated information,the efficiency and accuracy of action recognition would be greatly improved.Inspired by this idea,many researchers combined deep learning and attention mechanisms,and applied them to behavior/action recognition systems in recent years.The attention mechanism can be used to create different attention points in the spatial domain to guide the computer to focus on the analysis of action related areas.However,the number of videos used for training the deep network is limited,while the features of the action related areas may have various changes in practice.Therefore,when using a free-form attention mask to implement the attention mechanism,distractions often occur due to overfitting,which weakens the effectiveness of the attention mechanism for action recognition.To address the above issues,this paper proposes anl2,1-norm group sparsity regularization for learning structured attention masks.This method is based on the characteristics of human attention mechanism in biology and cognition,which are manifested in local attention.The prior of the sparsity of the block structure is added to the masks,so that these masks have the constraints of the spatial structure,and the attention can be focused on the key parts of the action to avoid dispersal.Based on the structured attention module,this paper proposes a recurrent convolutional model based on structured attention for action recognition.The model is composed of a convolutional network and a recurrent network.The convolutional neural network is for extracting the spatial features of each video frame,and the recurrent neural network uses the continuous information between the video frames to identify the action.The introduced structured attention enables the model to focus on the key feature regions,which improves the generalization ability of model as well the recognition performance.The method proposed in this paper is tested on two benchmark datasets.The experimental results show that the method can significantly improve the accuracy of attention on key areas of an action,and thus further improve the performance of recognition.
Keywords/Search Tags:Action recognition, Attention, Block-wise sparsity, Deep recurrent network, Convolutional neural network
Related items