Research On Video Action Recognition Model Based On Convolutional Neural Network With Attention Mechanism

Posted on:2024-01-27

Degree:Master

Type:Thesis

Country:China

Candidate:J Z Mao

Full Text:PDF

GTID:2568307151460284

Subject:Information and Communication Engineering

Abstract/Summary:

Convolutional neural networks have emerged as the most widely used deep learning technology for performing action recognition tasks.However,convolutional neural network-based action recognition models have certain limitations,such as insufficient feature extraction capabilities and insufficient representation of critical characteristics.In this research,three action recognition models are suggested in terms of feature extraction,feature representation,and convolutional neural network exploitation of local and global features:This research proposes an action recognition model integrating channel attention and knowledge distillation to overcome the problem of action recognition latency in two-stream 3D convolutional networks during model reasoning.The model introduced channel attention into 3D Res Next101,input continuous video frames,combined with a knowledge distillation strategy to build a weighted linear combination loss function,and trained both the teacher model and the student model,with the teacher model guiding the student model to learn important information,resulting in a better student model recognition effect.During the evaluation,the student model does not employ optical flow,minimizing the amount of processing.Experiments on the UCF101 and HMDB51 data sets show that the suggested model is feasible and valid.Given the massive computational load imposed by the two-stream design of 2D convolutional networks,this research offers an action recognition model based on spatiotemporal and motion features.The model inputs picture sequences using a sparse sampling method and incorporates a time shift module,spatiotemporal excitation module,and motion excitation module into 2D Res Net50 to integrate encoded videos’ spatiotemporal and motion features.In addition,the model employs the spatiotemporal feature enhancement module to improve the spatiotemporal information of the input video clips,resulting in a more substantial recognition effect.The model performs well on the data sets Something-Something V1 and Something-Something V2.To address the problem of the action recognition model’s weakness in modeling local and global information in the video,this research suggests an action recognition model that combines spatiotemporal multi-scale convolution with self-attention.This model employs a sparse sampling method,adds a spatiotemporal grouping multi-scale module in2 D Res Net50,and adaptively learns essential global video information via self-attention.This model performs well in recognition on the video action recognition data sets Something-Something V1 and Diving48.

Keywords/Search Tags:

action recognition, attention mechanism, knowledge distillation, spatiotemporal feature, motion feature

Related items

1	Research On Action Recognition Method Based On Spatiotemporal Feature Fusion And Knowledge Distillation Technology
2	Research On Action Recognition Algorithm Based On Spatiotemporal Modeling And Its Application
3	Human Action Recognition Based On Spatial Temporal Feature Fusion
4	Research On Action Recognition Method Based On Motion Feature Extraction And Spatio-temporal Feature Fusion
5	Research On Action Recognition Method Based On Key Frame And Attention Mechanism
6	Research On Skeleton Action Recognition Algorithm Based On Spatiotemporal Attention Mechanism
7	Video Action Research Based On Attention Mechanism And Spatiotemporal Fusion Network
8	Action Recognition Based On Spatio-temporal Features
9	Research On Human Motion Detection System Based On Spatio-Temporal Feature Fusion
10	Research On Human Action Recognition Method Based On Deep Learning