Research On Video Action Recognition Method Based On Spatio-temporal Feature Modeling

Posted on:2024-05-24

Degree:Master

Type:Thesis

Country:China

Candidate:Y W Yu

Full Text:PDF

GTID:2568307124473634

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

Video action recognition is a hot research topic in the field of computer vision.With the rapid development of artificial intelligence technology in recent years,video action recognition has wide application value in the areas of intelligent video surveillance,medical monitoring,and automatic driving.However,due to various factors such as complex backgrounds,camera movement,and human pose changes in behavior in the real environment,the video action recognition task becomes more challenging.In this paper,we address the current problems in action recognition by using spatio-temporal feature modeling.The primary research is as follows:(1)In this paper,we first review the video action recognition task into two broad categories:action recognition methods based on manual feature extraction and action recognition methods based on deep learning.Among them,the manual feature extraction methods are subdivided into overall features and local features,and the deep learning methods are divided into methods based on twostream convolutional networks,3D convolutional networks,recurrent neural networks and Transformer.A summary comparison analysis of current video action recognition methods is presented for the reference of related researchers.(2)Since the optical flow information in the two-stream network lacks the ability to capture long-distance temporal relationships,the 3D convolutional network has a large number of parameters and converges slowly.In order to learn more perfect spatio-temporal features,this paper proposes an multi-dimensional feature activation residual networks.The multi-dimensional feature activation residual networks use 2D convolutional networks to solve the problem of learning temporal feature expressions,using motion supplement excitation module to model temporal features and excite temporal channel motion information;meanwhile,using united information excitation module to incentivize channel and spatial information by temporal features in order to learn better temporal feature expressions.MFARs on the behavior recognition datasets UCF101 and HMDB51 achieved an accuracy of 96.5% and 73.6%,respectively.By comparing with current mainstream behavioral recognition models,the proposed multidimensional feature excitation method can effectively represent spatio-temporal features and obtain a better balance of complexity and classification accuracy.(3)In order to solve the problems of computational complexity and large number of parameters in 3D convolutional networks and Transformer methods,this paper introduces a self-attention mechanism based on 2D convolution and designs a long-short temporal feature fusion network to model temporal features.The long and short temporal features are modeled separately by different modules to suppress irrelevant information such as background and focus on motion regions,so as to improve the accuracy of video action recognition.And the effectiveness of the model is verified on two different datasets,UCF101 and Something-Something V1,and the analysis of ablation experiments concludes that the network can have the ability to improve the classification accuracy in action recognition tasks.

Keywords/Search Tags:

video action recognition, deep learning, convolutional network, attention mechanism, video feature representation

PDF Full Text Request

Related items

1	Deep Feature Fusion And Attention Models For Video Action Recognition
2	Deep Convolutional Neural Networks Based Video Action Recognition
3	Research On Video Action Recognition Model Based On Convolutional Neural Network With Attention Mechanism
4	Video Action Recognition Based On Hybrid Attention Mechanism And Multi-scale Feature Fusion
5	Deep Convolutional Video Representation Learning
6	Research On Video Action Recognition Based On Spatial-temporal Feature Fusion
7	Video Action Recognition Based On 2D Convolution Network Under Spatio-Temporal Feature Enhancement Mechanism
8	Video Action Recognition Technology Research Based On Deep Learning
9	Studies On Action Recognition In Video Based On Deep Learning
10	Video Spatio-Temporal Representation Learning Methods