Font Size: a A A

Action Recognition Algorithm Based On Spatio-Temporal Scene Graph And Its Application Research

Posted on:2024-02-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y C CuiFull Text:PDF
GTID:2568307067994629Subject:Electronic information
Abstract/Summary:PDF Full Text Request
The task of using computers to infer human actions in videos is called action recognition,and it is one of the important tasks in the field of video understanding.With the development of deep learning in the past decade,deep learning-based action recognition algorithms have made remarkable progress in action recognition tasks,however,existing algorithms usually regard human action as a single event and do not consider the composition of action-A series of dynamic interactions between humans and the surrounding scene,which limits the algorithm’s ability to learn action.The composition of human actions in videos can be well described using spatio-temporal scene graphs,however,the application of spatio-temporal scene graphs to action recognition tasks has been less explored.On the other hand,traditional video retrieval relies entirely on video-related text data without considering the content of the video.After introducing video understanding algorithms into the video retrieval system,retrieval based on video content can be realized,thereby improving the effectiveness of retrieval.In view of the above problems,this paper mainly conducts the following research:1.Proposed a action recognition algorithm based on GNN-RNN:This paper introduces the spatio-temporal scene graph into the field of action recognition,and proposes a action recognition algorithm that takes the spatio-temporal scene graph as input data.The temporal encoding modules are implemented based on traditional graph neural network(GNN)and recurrent neural network(RNN)respectively.Experimental results show that the proposed GNN-RNN action recognition algorithm outperforms the baseline action recognition algorithm and achieves good performance on the action recognition task.2.Proposed a Transformer-based action recognition algorithm:In order to further optimize the performance of the GNN-RNN action recognition algorithm,this paper proposes a Transformer-based action recognition algorithm based on it,which learns action features from the spatio-temporal scene graph through a graph Transformer encoder and a temporal Transformer encoder.The experimental results show that the temporal Transformer encoder performs better than the traditional RNN-based temporal encoding module in processing graph representation sequences,and the GNN-TTE action recognition algorithm combined with the traditional GNN-based graph encoding module and temporal Transformer encoder has better performance.Based on the baseline algorithm(GNN-RNN action recognition algorithm).3.Design and implement a text-video retrieval system based on video action and spatio-temporal scene graph: Based on the action recognition algorithm proposed in this paper,a text-video retrieval system based on video action and spatiotemporal scene graph is designed and implemented.The system evaluation shows that the text-video retrieval system implemented in this paper can effectively complete the retrieval according to the video content,and also proves the application value of the action recognition algorithm proposed in this paper.
Keywords/Search Tags:Action Recognition, Spatio-Temporal Scene Graph, Deep Learning, Text-Video Retrieval, Video Understanding
PDF Full Text Request
Related items