Font Size: a A A

Research On Deep Learning Algorithms For Human Action Recognition

Posted on:2024-08-03Degree:MasterType:Thesis
Country:ChinaCandidate:Q YuFull Text:PDF
GTID:2568307127460904Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Human action recognition technology has high research value and has been widely used in many different fields,such as human-computer interaction,virtual reality,smart home and smart security.In recent years,with the upgrading of hardware devices and the advancement of technology,deep learning technology has been widely used in the field of human action recognition.Human action recognition technology is different from traditional image processing technology,which needs to extract temporal information.RGB video data is the most common,but the number of3 D convolutional parameters used to process RGB video data is large,the computation is high,and the extracted feature scale is single.Skeleton data is more concise and more robust to environmental changes,but the perceptual field of the graph convolution used to process skeleton data is small,the connection of distant joints is difficult to obtain,and the information is concentrated in the near joints,making the network easy to overfit.In addition,the attention mechanism plays an important role in human action recognition,while the traditional model does not apply the attention mechanism to obtain the importance information of elements in channel,spatial and temporal dimensions.In this paper,two human action recognition models are proposed to address the above problems and are extensively experimentally validated on several datasets.The main work is as follows:(1)For RGB video-based action recognition,multiscale spatial temporal attentional convolution network are proposed.The three-dimensional convolution is decomposed into parallel two-dimensional spatial convolution and one-dimensional temporal convolution,which reduces the number of parameters and computational complexity.Applying the dilated convolution into it,the network has a larger perceptual field and extracts spatial-temporal features at multiple scales without adding additional parameters,which effectively improves the network’s ability to model long-time videos.The attention mechanism is applied in three dimensions,spatial,temporal and channel,respectively,to capture the important parts of the whole feature map more comprehensively,so that the network pays attention to the important features when extracting features and further enhances the feature extraction ability of the model.Through experiments on HMDB-51 and UCF101 datasets,the accuracy of the model surpasses most RGB video-based human action recognition algorithms,proving the effectiveness of the model.(2)For skeleton-based action recognition,an attentional contextual graph convolution network is proposed.The connection between each joint point and other joint points can be effectively obtained by calculating the context information,and the dependency relationship between distant joint points can be directly obtained from the context information.Meanwhile,the structure of the adjacency graph is optimized to make it contain richer semantic information,which effectively increases the receptive field and flexibility of the graph convolution and solves the overfitting problem of the graph convolution network.The spatial temporal joint attention module is embedded in the network to capture important elements at spatial and temporal locations in each channel and fuse them to make the network more accurate and efficient in learning the important parts of features.Through extensive experiments on three datasets: NTU RGB+D 60,NTU RGB+D 120,and Northwestern-UCLA,the accuracy of the model outperforms most skeleton-based human action recognition algorithms,demonstrating the effectiveness of the model.
Keywords/Search Tags:Human action recognition, Deep learning, Dilated convolution, Attention mechanism, Graph convolution network
PDF Full Text Request
Related items