Font Size: a A A

Research On Human Action Recognition Method Based On Spatiotemporal Features

Posted on:2024-06-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiuFull Text:PDF
GTID:2568307097457054Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
Human Action Recognition is a technology that identifies specific actions by analyzing human motion patterns and sequences in videos.It has been widely applied in fields such as network security,intelligent driving,and medical health.Thanks to the excellent feature extraction ability of deep learning technology,human behavior recognition methods have emerged endlessly.However,human behavior recognition algorithms based on deep learning often rely on a large amount of training data,and the rich spatiotemporal features in videos require higher models,resulting in lower recognition accuracy of existing methods.In order to further improve the performance of human behavior recognition models,this article conducted the following research work:(1)To address the issues of insufficient data samples and insufficient feature extraction ability of human behavior recognition models,this paper proposes a time segmented human behavior recognition method based on Mixup data augmentation and CBAM.Firstly,time segmented sampling is performed on video data to reduce the computational complexity of the network and enable the feature extraction network to obtain the structural information of the entire video;Secondly,in order to fully extract features related to human actions in space,a ResNet50 network with CBAM added is used to extract spatial features from a single frame image,and an average pooling segmented fusion strategy is used to fuse spatial features between different frames to capture long-term temporal information;Finally,the Mixup data augmentation strategy is used to train the network model,randomly mixing the training data within the same batch to enhance the complexity of the data samples.The experimental results show that the time segmented human behavior recognition method based on Mixup and CBAM achieves good recognition effect on UCF101 and HMDB51 data sets,reduces the overfitting of the network,and improves the recognition accuracy on the two data sets by 1.51%and 4.84%respectively compared with the original TSN network.(2)On the premise of ensuring the complexity of the model,this paper proposes a TES efficient human action recognition network based on the research content(1)to further enhance the temporal feature extraction ability of the model.This model incorporates TES temporal excitation shift modules into each residual block of the ResNet50 network,enabling the network to better learn short-term motion features in videos.The TES module acts on the input feature map and consists of two parts:the temporal excitation module TEM allocates weights along the temporal dimension of the input feature map,highlighting video frames that have a significant impact on classification results through attention;The temporal shift module TSM partially shifts and exchanges the input feature map along the channel dimension,so that the features of each video frame obtain feature information from adjacent frames,thereby implicitly extracting motion features.The TES module can be well integrated into the ResNet50 network and time segmented network to fully capture short-term motion features and long-term temporal information in videos.The experimental results show that the TES model proposed in this article effectively extracts the spatiotemporal features of human actions in videos.It achieved recognition results of 85.63%,49.91%,73.5%,95.53%,and 76.1%on the Diving48,Something-V1,Kinetic400,UCF101,and HMDB51 datasets,respectively.The inference speed of a single video reached 0.035 seconds.Therefore,the TES model achieved a good balance between recognition accuracy and speed.
Keywords/Search Tags:Human Action Recognition, Temporal Excitation, Temporal Shift, Data Augmentation, Convolutional Attention
PDF Full Text Request
Related items