Study On Human Action Recognition Based On Non-local Spatial-temporal Residual Attention Mechanism

Posted on:2021-03-21

Degree:Master

Type:Thesis

Country:China

Candidate:J Luo

Full Text:PDF

GTID:2518306107985799

Subject:Instrument Science and Technology

Abstract/Summary:

PDF Full Text Request

Human action recognition is one of the most active topic in the field of computer vision.It has a wide range of applications and great values on research.At present,the research can be divided into handcrafted-feature and deep-learning methods.In handcrafted-feature methods,features need to be designed manually and can be easily influenced by designers' experience.So deep-learning methods,using neural network to learn features adaptively,becomes the main direction at present.Though some achievement has been made,there are still some problems remained to be solved: first,almost every model puts the same weight on every part in video,which makes noise irrelevant to recognition be introduced.Second,manual algorithm is used to extract motion features from video,which cannot be automatically completed by the recognition model.Finally,current convolutional model can only extract local information due to the limitation of convolutional kernel.To solve the above problems,following work has been done:(1)A temporal attention module is proposed.The module consists of intra-frame attention and inter-frame attention.Using non-local connection,two sub modules capture the global dependencies within and between frames.By analyzing the dependencies captured by intra and inter-frame sub module,the probability that a frame a frame belongs to foreground,and whether the frame has obvious difference with other frames,can be get.These information make the proposed temporal attention module ignore the background and redundant frame,and pay more attention to frames which have high relevance to recognition results.(2)A spatial attention of video is constructed by nonlocal connection.Nonlocal connections regard the points with high dependence as key points,and the model will pay more attention to these points.As the features extracted by neural network have redundancy,the dependencies between feature channels are also modeled,and the attention score between the output channels is output,so that the model ignores the redundant features with high repeatability.This information makes the model further focus on the key points of motion.(3)Based on the definition of optical flow,a motion feature is extracted.The spatio-temporal gradient is directly used to express the motion features on the attentional mask output by the attention mechanism,which can be achieved by only spatial filtering and subtraction.The whole motion representation model is differentiable and can be integrated into any neural networks for further learning.Experiments have been done in UCF-101 and HMDB51 datasets.The recognition accuracy of 97.1% and 78.0% are finally obtained.The attention mechanism improves the accuracy from the base-line recognition model by 7.6% and 7.2%.Comparing with the models also using the mechanism,the accuracy of our model is improved at least1.6% and 5.3%.Comparing with method that uses optical flow based feature,the accuracy our model is improved by 1.1% and 3.8%.

Keywords/Search Tags:

Action Recognition, Non-local connection, Temporal Attention, Spatial Attention, Spatial-temporal gradient feature

PDF Full Text Request

Related items

1	Research Of Human Action Recognition Based On Composite Spatial And Temporal Feature
2	Action Recognition Based On Two Stream Spatial-Temporal Attention Network
3	Attention Mechanism Based Deep Network For Human Action Recognition In Video
4	Research And Implementation Of Human Action Recognition Based On Temporal And Spatial Relationship Enhancement
5	Research On Human Action Recognition Of Video Content Based On Spatial-Temporal Features
6	Research And Implementation Of Video Action Recognition Based On Long-Time Feature Fusion And Attention Mechanism
7	The Research On Robust Spatial-temporal Co-occurrence Feature Extraction Algorithm For Facial Action Unit Detection
8	Temporal Action Localization In Massive Multimedia Video Scenario
9	Action Proposal And Activity Recognition Based On Attention LSTM
10	Research On Video Person Re-identification Method Based On Spatial-temporal Attention Mechanism