Research On Video Action Recognition Based On Spatial-temporal Feature Fusion

Posted on:2023-04-09

Degree:Master

Type:Thesis

Country:China

Candidate:H Chen

Full Text:PDF

GTID:2568306788455404

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

In recent years,with the rapid development of computer vision and artificial intelligence technology,human motion recognition by computer has become one of the hot topics in the field of artificial intelligence.Its main task is to extract valid spatial and temporal features from the input video,and then classify the different features according to the principle of similar similarities and dissimilar exclusions.At present,motion recognition has been widely studied in many applications such as intelligent monitoring,human-computer interaction and motion detection.This paper summarizes and summarizes the current main motion recognition algorithms based on space-time feature fusion,and proposes three space-time feature fusion methods for extracting video features with strong expression ability and robustness.The following is the main work of this paper:(1)This paper first summarizes the main algorithms based on in-depth learning into three categories: two-dimensional convolution neural network algorithm,three-dimensional convolution neural network algorithm and self-attention mechanism based structure algorithm.Then,the advantages and disadvantages of different methods in structure design are analyzed,and their recognition effects on UCF101,HMDB51,and omething-Something datasets are compared according to their experimental data analysis.(2)To compensate for the disadvantage that two-dimensional convolution networks cannot link feature context to learn global information,this paper uses self-attention mechanism to enhance learning of advanced feature maps of convolution network output along time and space dimensions,respectively.Multiscale features are extracted by replacing self-attention linear transformation matrix with spatial one-dimensional convolution in different directions and time series onedimensional convolution in different void rates,which enriches the model’s ability to express features.Through the analysis of ablation experiments,it is concluded that the space-time convolution attention designed in this paper can effectively improve the recognition accuracy of the model.(3)In order to solve the high computational complexity of three-dimensional convolution,the output features of two-dimensional space convolution and one-dimensional time series convolution are fused by compression excitation along the time dimension,and then the space-time feature interaction module is designed.The experimental results show that the recognition performance of this fusion method is significantly improved compared with both 3D and P3 D convolutions.(4)In order to make the feature map achieve better space-time feature interaction in the propagation process,a multilevel feature aggregation module is designed based on the channel hierarchy and the connection of peer residuals.The module divides the convolution network on the channel dimension,links the input of the three-dimensional convolution with the output residuals of the two-dimensional convolution to extract the video-level features of the larger field of view,and then aggregates the common results of the two-dimensional and three-dimensional convolution on the channel dimension to enrich the diversity of the features.In addition,a dynamic information enhancement module is designed,which enhances the feature weights of dynamic regions along the time dimension and weakens the interference caused by unrelated information.The ablation experiments show that both modules have the ability to improve classification accuracy in motion recognition tasks.

Keywords/Search Tags:

Action recognition, Convolution network, Self-attention mechanism, Deep learning

PDF Full Text Request

Related items

1	An Improved Action Recognition Method With 3D Convolution Neural Network
2	Research On Video Action Recognition Based On Spatial-temporal Feature Fusion
3	Human Skeleton-based Action Recognition Based On Deep Learning
4	Research On Visual Action Recognition Based On Deep Learning
5	Research On Human Action Recognition In Videos Based On Deep Learning
6	Research On Human Action Recognition Method Integrating Visual Attention Mechanism And Deep Learning
7	Research On Action Recognition Algorithm Based On 3D Convolution
8	Studies On Action Recognition In Video Based On Deep Learning
9	Research On Human Action Recognition Method Based On Deep Learning
10	Research On Action Recognition Based On 3D Convolutional Neural Network