Research On Key Technologies Of Video Action Recognition Based On Spatio-Temporal Transformer

Posted on:2023-03-16

Degree:Master

Type:Thesis

Country:China

Candidate:M Y Qiao

Full Text:PDF

GTID:2568307127460384

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Video action recognition has become a very important research hotspot in the field of computer vision.Due to the variety of video data,it is difficult to recognize people or objects in complex scenes.With the continuous development of artificial intelligence technology,video action recognition technology has achieved rapid development.A variety of models based on convolutional neural networks have been proposed for feature extraction and classification of video actions.However,video action recognition still faces complex problems and severe challenges,such as low recognition accuracy and large training parameters.Therefore,we propose a video action recognition method based on spatio-temporal transformer.The main research contents include: first,research how to apply the transformer model to video action recognition;second,research how to optimize the network structure while saving the cost of GPU hardware,so as to improve the utilization rate.This thesis mainly focuses on the following research of video action recognition based on space-time transformer.First,take different actions on the Patch embedded module.This method designs two different schemes,namely,non-convolution operation and convolution operation,to achieve feature extraction for each block of image or video frame.The experiment proves that the feature extraction with convolution operation is strong and can improve the network performance to a certain extent.Secondly,a method based on space-time transformer module is designed and proposed.The method includes LSTM(Long Short-Term Memory)module and space-time transformer module.First connect the initial LSTM module with the fusion layer,and then combine the space-time transformer module to form the R-TST(LSTM-Time Space Transformer)module.The experimental results show that the model is effective for video motion recognition.Finally,the HDMB51 data set and UCF101 data set are used in the ablation experiment of video motion recognition.Taking them as the benchmark data,it is proved that the model method proposed in this thesis can effectively perform motion recognition,while improving network performance,reducing the amount of parameters,saving GPU hardware costs,and improving utilization.

Keywords/Search Tags:

Action Recognition, Attention Mechanism, Vision Transformer, LSTM, GPU Hardware Cost

PDF Full Text Request

Related items

1	Research On Action Recognition Algorithm Based On Attention Mechanism And Multi-kernel Convolutional LSTM
2	Action Recognition And Temporal Action Localization Based On Attention Mechanism
3	Research On Human Action Recognition Fusing 2D CNN And Vision Transformer
4	Human Action Recognition Method Based On Bi-LSTM And Attention Mechanism
5	Skeletal Action Recognition Based On Attention Mechanism Preferences And Local Information Enhancement
6	Research On Skeleton Action Recognition Algorithm Based On Attention Mechanism
7	LSTM-based Human Continuous Motion Recognition
8	Research On Human Action Recognition Technology Based On LSTM
9	Research On Action Recognition Technology Based On Video
10	Study On Human Action Recognition Method Based On Deep Learning