Font Size: a A A

Research On Human Action Recognition Based On Phase Spectrum Motion Saliency Detection And Self-attention Mechanism

Posted on:2024-03-17Degree:MasterType:Thesis
Country:ChinaCandidate:G W XuFull Text:PDF
GTID:2568307139996349Subject:Master of Electronic Information (Professional Degree)
Abstract/Summary:PDF Full Text Request
Human action recognition is an active research area in computer vision,aiming to recognize action categories from videos.Human actions are activities that occur over a continuous period of time.To improve the accuracy of action recognition models,it is necessary to fully consider the temporal features of actions.Most existing action recognition methods rely on optical flow to extract temporal features of actions.However,optical flow is sensitive to slight motion variations between frames,which are often irrelevant background noise such as swaying leaves and audience movements.In addition,traditional temporal global average pooling layers in convolutional neural networks fail to capture the order and importance of temporal features,which may be key features for distinguishing actions.To address these issues,this study proposes the following:(1)To enhance the model’s ability to extract temporal features and filter out background noise,this study proposes an action recognition method based on phase spectrum motion saliency detection.First,a two-stream model including both spatial and temporal paths is constructed using the Res Ne Xt network to enhance the model’s ability to extract temporal features.Then,the proposed method uses a phase spectrum-based motion saliency detection method to extract salient features of actions,which are stacked with video frames and fed into the spatial path for feature extraction to enhance the model’s ability to filter out background noise.Experimental studies conducted on the UCF101 and HMDB51 datasets demonstrate that extracting salient features of actions can effectively improve the model’s recognition accuracy.(2)To more fully extract the temporal and spatial features of actions,this study proposes a post-temporal modeling action recognition method based on self-attention mechanisms.To effectively differentiate between action categories,certain temporal features may be more important than others,or the order of temporal information may be more beneficial for extracting temporal features than simple average temporal information.However,temporal global average pooling layers ignore these characteristics,resulting in incomplete utilization of temporal features.To address this issue,the proposed method employs a self-attention mechanism as a replacement for the conventional temporal global average pooling layer,with the aim of more effectively extracting temporal and spatial features of actions.Experimental results demonstrate the effectiveness of this method,achieving competitive recognition accuracy compared to advanced models on the UCF101 and HMDB51 datasets.
Keywords/Search Tags:deep learning, action recognition, motion saliency features, self-attention mechanism
PDF Full Text Request
Related items