Research On Video Action Recognition Methods Based On Two-stream Networks

Posted on:2022-07-08

Degree:Master

Type:Thesis

Country:China

Candidate:X Q Xiong

Full Text:PDF

GTID:2518306536954839

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Video action recognition is a representative task in computer vision,focusing on the method of automatically recognizing the semantic labels of given human actions by analyzing the spatiotemporal information in videos.With the rapid development of deep learning and the continuous improvement of hardware performance,video action recognition has achieved great advancements and many successful deep neural network models have been proposed for the extraction and classification of action features.But at the same time,there are many challenges to developing effective models for action recognition.Two-stream networks are the most popular and effective methods for video action recognition.However,the traditional two-stream architecture could not model the long-term temporal motion information and lacks the interaction between the spatial and temporal features.To model spatiotemporal information in videos more effectively,in this paper,two kinds of video action recognition models based on two-stream networks are proposed.The main contents are as follows:To model the long-term sequences of videos,two kinds of spatiotemporal residual networks are proposed,transforming the 2D spatial residual network to 3D domain.In our methods,we constructed two kinds of spatiotemporal residual units to learn the local temporal motion features that are based on residual scaling and identity mapping.And by stacking several such units through the hierarchy of the network to build the 3D spatiotemporal residual network,the temporal receptive is extended,making it possible to learn global motion information.We explored different kinds of 3D architectures to model long-term motion information,and different methods to initialize temporal kernels.The results show that our methods are effective to model the long-term motion information and global spatiotemporal features are more effective than local features for video action recognition.To build the interaction of spatial and temporal features in two-stream networks,two kinds of cross-stream interaction strategies(additive and multiplicative interaction)were introduced,making it possible to fuse the two-stream networks at multiple abstract levels.We systematically explored various alternatives to connect the two-stream networks,and the results show that effective cross-stream interaction could further improve the performance.Experiments on the UCF101 and HMDB51 datasets show that the proposed models in this paper are superior to the traditional method,which indicates that our optimized methods could make better use of spatiotemporal information for video action recognition.

Keywords/Search Tags:

Video Action Recognition, Two-Stream Networks, Two-Stream Fusion Networks, Spatiotemporal Residual Network, Additive Interaction, Multiplicative Interaction

PDF Full Text Request

Related items

1	Spatiotemporal Squeeze-and-Excitation Residual Multiplier Networks For Video Action Recognition
2	Research On Anomalous Human Action Detection Based On Two-stream Spatiotemporal Residual Networks
3	Research Of Video Action Recognition Based On Two-stream Information Fusion Network
4	Research And Application Of Action Recognition Method Based On Multi Stream Depth Feature Fusion
5	Research On Human Action Recognition Based On Video Stream
6	Research On Video Action Recognition Method Based On Deep Learning
7	Research On Human Action Behavior Recognition Technology Based On Deep Learning
8	Research Of Video Action Recognition Based On Two-stream Convolutional Neurel Network
9	Human Action Recognition Based On Two-Stream Network
10	Video Action Research Based On Attention Mechanism And Spatiotemporal Fusion Network